[jira] [Commented] (STORM-960) Hive-Bolt can lose tuples when flushing data
[ https://issues.apache.org/jira/browse/STORM-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639705#comment-14639705 ] Sriharsha Chintalapani commented on STORM-960: -- [~doss...@gmail.com] can you send just your commit only easy to review. Hive-Bolt can lose tuples when flushing data Key: STORM-960 URL: https://issues.apache.org/jira/browse/STORM-960 Project: Apache Storm Issue Type: Improvement Components: external Reporter: Aaron Dossett Assignee: Aaron Dossett Priority: Minor In HiveBolt's execute method tuples are ack'd as they are received. When a batchsize of tuples has been received, the writers are flushed. However, if the flush fails only the most recent tuple will be marked as failed. All prior tuples will already have been ack'd. This creates a window for data loss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: Eventhub spout meta data
Github user rsltrifork commented on the pull request: https://github.com/apache/storm/pull/651#issuecomment-124075611 Event hub (and Kafka) play well into event source architectures as event ingest point for later Storm processing to downstream stateful consumers. Advanced event stream processing, such as replaying parts of a stream, requires that the downstream consumers can synchronise different stream runs to their stateful view, which itself can be seen as an aggregation of all previous events. To set up the right context for re-processing the stream in a deterministic way, they need to sync their view with the incoming old data. To be able to do this, they need knowledge of the event sequenceNumber and partition. For example, if you have a bolt that calculates total_order_amount for a stream of orders, and emits order tuples with the total_order_amount calculated for all previous orders, replaying an order event should not change total_order_amount. I.e. orders with a higher sequenceNumber than the order being processed should not be included in total_order_amount. This synchronisation can be achieved if the bolt has access to the parition and sequenceNumber from eventHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (STORM-297) Storm Performance cannot be scaled up by adding more CPU cores
[ https://issues.apache.org/jira/browse/STORM-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638772#comment-14638772 ] jia.fu edited comment on STORM-297 at 7/23/15 1:10 PM: --- I meet this issue yet .anyone has a solution on production storm cluster ? was (Author: jia.fu): I met this issue yet .anyone has a solution on production storm cluster ? Storm Performance cannot be scaled up by adding more CPU cores -- Key: STORM-297 URL: https://issues.apache.org/jira/browse/STORM-297 Project: Apache Storm Issue Type: Bug Reporter: Sean Zhong Assignee: Sean Zhong Labels: Performance, netty Fix For: 0.9.2-incubating Attachments: Storm_performance_fix.pdf, storm_Netty_receiver_diagram.png, storm_conf.txt, storm_performance_fix.patch, worker_throughput_without_storm-297.png We cannot scale up the performance by adding more CPU cores and increasing parallelism. For a 2 layer topology Spout ---shuffle grouping-- bolt, when message size is small (around 100 bytes), we can find in the below picture that neither the CPU nor the network is saturated. When message size is 100 bytes, only 40% of CPU is used, only 18% of network is used, although we have a high parallelism (overall we have 144 executors) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-959) remove unnecessary dependency from storm-hive/pom.xml
[ https://issues.apache.org/jira/browse/STORM-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638743#comment-14638743 ] Aaron Dossett commented on STORM-959: - This is a duplicate of STORM-944, which has a similar PR pending. remove unnecessary dependency from storm-hive/pom.xml - Key: STORM-959 URL: https://issues.apache.org/jira/browse/STORM-959 Project: Apache Storm Issue Type: Bug Components: external Affects Versions: 0.11.0 Reporter: caofangkun Assignee: caofangkun Priority: Minor [org.apache.calcite:calcite-core:0.9.2-incubating|https://github.com/apache/storm/blob/master/external/storm-hive/pom.xml#L84] does not take affect at all. Becase of hive-exec org.apache.calcite:calcite-core:0.9.2-incubating-SNAPSHOT will be downloaded {code} Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/maven-metadata.xml Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/calcite-core-0.9.2-incubating-SNAPSHOT.pom [WARNING] The POM for org.apache.calcite:calcite-core:jar:0.9.2-incubating-SNAPSHOT is missing, no dependency information available {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-297) Storm Performance cannot be scaled up by adding more CPU cores
[ https://issues.apache.org/jira/browse/STORM-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638772#comment-14638772 ] jia.fu commented on STORM-297: -- I met this issue yet .anyone has a solution on production storm cluster ? Storm Performance cannot be scaled up by adding more CPU cores -- Key: STORM-297 URL: https://issues.apache.org/jira/browse/STORM-297 Project: Apache Storm Issue Type: Bug Reporter: Sean Zhong Assignee: Sean Zhong Labels: Performance, netty Fix For: 0.9.2-incubating Attachments: Storm_performance_fix.pdf, storm_Netty_receiver_diagram.png, storm_conf.txt, storm_performance_fix.patch, worker_throughput_without_storm-297.png We cannot scale up the performance by adding more CPU cores and increasing parallelism. For a 2 layer topology Spout ---shuffle grouping-- bolt, when message size is small (around 100 bytes), we can find in the below picture that neither the CPU nor the network is saturated. When message size is 100 bytes, only 40% of CPU is used, only 18% of network is used, although we have a high parallelism (overall we have 144 executors) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388631 --- Diff: external/storm-elasticsearch/README.md --- @@ -0,0 +1,66 @@ +# Storm Elasticsearch Bolt Trident State + + EsIndexBolt, EsPercolateBolt and EsState allows users to stream data from storm into Elasticsearch directly. + For detailed description, please refer to the following. + +## EsIndexBolt (org.apache.storm.elasticsearch.bolt.EsIndexBolt) + +EsIndexBolt streams tuples directly into Elasticsearch. Tuples are indexed in specified index type combination. +User should make sure that there are source, index,type, and id fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be indexed in elastic search. --- End diff -- elastic search = Elasticsearch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639789#comment-14639789 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/573#issuecomment-124287695 @sweetest The code looks great to me overall, too. Btw, actually I read your post from your company's official technical blog. (http://d2.naver.com/helloworld/1044388) This module covers your use case, but IMO some functionalities should be added to cover more (and general) use cases. Surely, we can just add to another issues, and implement it later. Here's the list. - Introduces Tuple - ES document mapper to get rid of constant field mapping (source, index, type) - Introduces Lookup(or Query)Bolt which retrieves matched documents from ES and emit them - Introduces BaseQueryFunction which is same to Lookup(or Query)Bolt but can be used in Trident topology For Lookup(or Query)Bolt and BaseQueryFunction, we need to introduce ES document - Tuple mapper, and we may want to set which field is used for querying. We may also want to consider about the strategy - what we can do when too many documents are matched. @revans2 @harshach @sweetest Please comment me when you're unsure about features I've suggested. Thanks! Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/573#issuecomment-124287695 @sweetest The code looks great to me overall, too. Btw, actually I read your post from your company's official technical blog. (http://d2.naver.com/helloworld/1044388) This module covers your use case, but IMO some functionalities should be added to cover more (and general) use cases. Surely, we can just add to another issues, and implement it later. Here's the list. - Introduces Tuple - ES document mapper to get rid of constant field mapping (source, index, type) - Introduces Lookup(or Query)Bolt which retrieves matched documents from ES and emit them - Introduces BaseQueryFunction which is same to Lookup(or Query)Bolt but can be used in Trident topology For Lookup(or Query)Bolt and BaseQueryFunction, we need to introduce ES document - Tuple mapper, and we may want to set which field is used for querying. We may also want to consider about the strategy - what we can do when too many documents are matched. @revans2 @harshach @sweetest Please comment me when you're unsure about features I've suggested. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388752 --- Diff: external/storm-elasticsearch/README.md --- @@ -0,0 +1,66 @@ +# Storm Elasticsearch Bolt Trident State + + EsIndexBolt, EsPercolateBolt and EsState allows users to stream data from storm into Elasticsearch directly. + For detailed description, please refer to the following. + +## EsIndexBolt (org.apache.storm.elasticsearch.bolt.EsIndexBolt) + +EsIndexBolt streams tuples directly into Elasticsearch. Tuples are indexed in specified index type combination. +User should make sure that there are source, index,type, and id fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be indexed in elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsIndexBolt indexBolt = new IndexBolt(esConfig); +``` + +## EsPercolateBolt (org.apache.storm.elasticsearch.bolt.EsPercolateBolt) + +EsPercolateBolt streams tuples directly into Elasticsearch. Tuples are used to send percolate request to specified index type combination. +User should make sure that there are source, index, and type fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be sent in percolate request to elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsPercolateBolt percolateBolt = new EsPercolateBolt(esConfig); +``` + +### EsConfig (org.apache.storm.elasticsearch.common.EsConfig) + +Two bolts above takes in EsConfig as a constructor arg. + + ```java + EsConfig esConfig = new EsConfig(); + esConfig.setClusterName(clusterName); + esConfig.setNodes(new String[]{localhost:9300}); + ``` + +EsConfig params --- End diff -- ```EsConfig params = ### EsConfig params``` It would be better to emphasize this section. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639776#comment-14639776 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388752 --- Diff: external/storm-elasticsearch/README.md --- @@ -0,0 +1,66 @@ +# Storm Elasticsearch Bolt Trident State + + EsIndexBolt, EsPercolateBolt and EsState allows users to stream data from storm into Elasticsearch directly. + For detailed description, please refer to the following. + +## EsIndexBolt (org.apache.storm.elasticsearch.bolt.EsIndexBolt) + +EsIndexBolt streams tuples directly into Elasticsearch. Tuples are indexed in specified index type combination. +User should make sure that there are source, index,type, and id fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be indexed in elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsIndexBolt indexBolt = new IndexBolt(esConfig); +``` + +## EsPercolateBolt (org.apache.storm.elasticsearch.bolt.EsPercolateBolt) + +EsPercolateBolt streams tuples directly into Elasticsearch. Tuples are used to send percolate request to specified index type combination. +User should make sure that there are source, index, and type fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be sent in percolate request to elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsPercolateBolt percolateBolt = new EsPercolateBolt(esConfig); +``` + +### EsConfig (org.apache.storm.elasticsearch.common.EsConfig) + +Two bolts above takes in EsConfig as a constructor arg. + + ```java + EsConfig esConfig = new EsConfig(); + esConfig.setClusterName(clusterName); + esConfig.setNodes(new String[]{localhost:9300}); + ``` + +EsConfig params --- End diff -- ```EsConfig params = ### EsConfig params``` It would be better to emphasize this section. Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/573#issuecomment-12420 @sweetest And I saw elasticsearch-hadoop has EsSpout, but this module doesn't include Spout implementation. https://github.com/elastic/elasticsearch-hadoop/blob/master/storm/src/main/java/org/elasticsearch/storm/EsSpout.java Did you take a look at EsSpout and decided it is not necessary for current module? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-959) remove unnecessary dependency from storm-hive/pom.xml
[ https://issues.apache.org/jira/browse/STORM-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639820#comment-14639820 ] ASF GitHub Bot commented on STORM-959: -- Github user caofangkun commented on the pull request: https://github.com/apache/storm/pull/652#issuecomment-124295316 duplicates #609 remove unnecessary dependency from storm-hive/pom.xml - Key: STORM-959 URL: https://issues.apache.org/jira/browse/STORM-959 Project: Apache Storm Issue Type: Bug Components: external Affects Versions: 0.11.0 Reporter: caofangkun Assignee: caofangkun Priority: Minor [org.apache.calcite:calcite-core:0.9.2-incubating|https://github.com/apache/storm/blob/master/external/storm-hive/pom.xml#L84] does not take affect at all. Becase of hive-exec org.apache.calcite:calcite-core:0.9.2-incubating-SNAPSHOT will be downloaded {code} Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/maven-metadata.xml Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/calcite-core-0.9.2-incubating-SNAPSHOT.pom [WARNING] The POM for org.apache.calcite:calcite-core:jar:0.9.2-incubating-SNAPSHOT is missing, no dependency information available {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388649 --- Diff: external/storm-elasticsearch/README.md --- @@ -0,0 +1,66 @@ +# Storm Elasticsearch Bolt Trident State + + EsIndexBolt, EsPercolateBolt and EsState allows users to stream data from storm into Elasticsearch directly. + For detailed description, please refer to the following. + +## EsIndexBolt (org.apache.storm.elasticsearch.bolt.EsIndexBolt) + +EsIndexBolt streams tuples directly into Elasticsearch. Tuples are indexed in specified index type combination. +User should make sure that there are source, index,type, and id fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be indexed in elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsIndexBolt indexBolt = new IndexBolt(esConfig); --- End diff -- new IndexBolt(esConfig) = new EsIndexBolt(esConfig) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388715 --- Diff: external/storm-elasticsearch/README.md --- @@ -0,0 +1,66 @@ +# Storm Elasticsearch Bolt Trident State + + EsIndexBolt, EsPercolateBolt and EsState allows users to stream data from storm into Elasticsearch directly. + For detailed description, please refer to the following. + +## EsIndexBolt (org.apache.storm.elasticsearch.bolt.EsIndexBolt) + +EsIndexBolt streams tuples directly into Elasticsearch. Tuples are indexed in specified index type combination. +User should make sure that there are source, index,type, and id fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be indexed in elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsIndexBolt indexBolt = new IndexBolt(esConfig); +``` + +## EsPercolateBolt (org.apache.storm.elasticsearch.bolt.EsPercolateBolt) + +EsPercolateBolt streams tuples directly into Elasticsearch. Tuples are used to send percolate request to specified index type combination. +User should make sure that there are source, index, and type fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be sent in percolate request to elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsPercolateBolt percolateBolt = new EsPercolateBolt(esConfig); +``` + +### EsConfig (org.apache.storm.elasticsearch.common.EsConfig) --- End diff -- ### EsConfig = ## EsConfig (EsConfig is not only for EsPercolateBolt.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639769#comment-14639769 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388631 --- Diff: external/storm-elasticsearch/README.md --- @@ -0,0 +1,66 @@ +# Storm Elasticsearch Bolt Trident State + + EsIndexBolt, EsPercolateBolt and EsState allows users to stream data from storm into Elasticsearch directly. + For detailed description, please refer to the following. + +## EsIndexBolt (org.apache.storm.elasticsearch.bolt.EsIndexBolt) + +EsIndexBolt streams tuples directly into Elasticsearch. Tuples are indexed in specified index type combination. +User should make sure that there are source, index,type, and id fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be indexed in elastic search. --- End diff -- elastic search = Elasticsearch Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639771#comment-14639771 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388649 --- Diff: external/storm-elasticsearch/README.md --- @@ -0,0 +1,66 @@ +# Storm Elasticsearch Bolt Trident State + + EsIndexBolt, EsPercolateBolt and EsState allows users to stream data from storm into Elasticsearch directly. + For detailed description, please refer to the following. + +## EsIndexBolt (org.apache.storm.elasticsearch.bolt.EsIndexBolt) + +EsIndexBolt streams tuples directly into Elasticsearch. Tuples are indexed in specified index type combination. +User should make sure that there are source, index,type, and id fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be indexed in elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsIndexBolt indexBolt = new IndexBolt(esConfig); --- End diff -- new IndexBolt(esConfig) = new EsIndexBolt(esConfig) Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639772#comment-14639772 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388715 --- Diff: external/storm-elasticsearch/README.md --- @@ -0,0 +1,66 @@ +# Storm Elasticsearch Bolt Trident State + + EsIndexBolt, EsPercolateBolt and EsState allows users to stream data from storm into Elasticsearch directly. + For detailed description, please refer to the following. + +## EsIndexBolt (org.apache.storm.elasticsearch.bolt.EsIndexBolt) + +EsIndexBolt streams tuples directly into Elasticsearch. Tuples are indexed in specified index type combination. +User should make sure that there are source, index,type, and id fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be indexed in elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsIndexBolt indexBolt = new IndexBolt(esConfig); +``` + +## EsPercolateBolt (org.apache.storm.elasticsearch.bolt.EsPercolateBolt) + +EsPercolateBolt streams tuples directly into Elasticsearch. Tuples are used to send percolate request to specified index type combination. +User should make sure that there are source, index, and type fields declared in preceding bolts or spout. +index and type fields are used for identifying target index and type. +source is a document in JSON format string that will be sent in percolate request to elastic search. + +```java +EsConfig esConfig = new EsConfig(); +esConfig.setClusterName(clusterName); +esConfig.setNodes(new String[]{localhost:9300}); +EsPercolateBolt percolateBolt = new EsPercolateBolt(esConfig); +``` + +### EsConfig (org.apache.storm.elasticsearch.common.EsConfig) --- End diff -- ### EsConfig = ## EsConfig (EsConfig is not only for EsPercolateBolt.) Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639801#comment-14639801 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/573#issuecomment-12420 @sweetest And I saw elasticsearch-hadoop has EsSpout, but this module doesn't include Spout implementation. https://github.com/elastic/elasticsearch-hadoop/blob/master/storm/src/main/java/org/elasticsearch/storm/EsSpout.java Did you take a look at EsSpout and decided it is not necessary for current module? Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639782#comment-14639782 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388927 --- Diff: external/storm-elasticsearch/src/main/java/org/apache/storm/elasticsearch/bolt/EsPercolateBolt.java --- @@ -0,0 +1,81 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.storm.elasticsearch.bolt; + +import backtype.storm.task.OutputCollector; +import backtype.storm.task.TopologyContext; +import backtype.storm.topology.OutputFieldsDeclarer; +import backtype.storm.tuple.Fields; +import backtype.storm.tuple.Tuple; +import backtype.storm.tuple.Values; +import org.apache.storm.elasticsearch.common.EsConfig; +import org.elasticsearch.action.percolate.PercolateResponse; +import org.elasticsearch.action.percolate.PercolateSourceBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; + +public class EsPercolateBolt extends AbstractEsBolt { +private static final Logger LOG = LoggerFactory.getLogger(EsPercolateBolt.class); + +/** + * EsPercolateBolt constructor + * @param esConfig Elasticsearch configuration containing node addresses and cluster name {@link EsConfig} + */ +public EsPercolateBolt(EsConfig esConfig) { +super(esConfig); +} + +@Override +public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) { +super.prepare(map, topologyContext, outputCollector); +} + +/** + * Executes percolate request for given tuple. + * @param tuple should contain string values of 3 declared fields: source, index, type + */ +@Override +public void execute(Tuple tuple) { +try { +String source = tuple.getStringByField(source); +String index = tuple.getStringByField(index); +String type = tuple.getStringByField(type); + +PercolateResponse response = client.preparePercolate().setIndices(index).setDocumentType(type) + .setPercolateDoc(PercolateSourceBuilder.docBuilder().setDoc(source)).execute().actionGet(); +if (response.getCount() 0) { +for (PercolateResponse.Match match : response) { +String id = match.getId().toString(); +collector.emit(new Values(id)); +} +} +collector.ack(tuple); +} catch (Exception e) { +e.printStackTrace(); --- End diff -- Please use LOG instead of printing to stderr. (Line 71) Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388927 --- Diff: external/storm-elasticsearch/src/main/java/org/apache/storm/elasticsearch/bolt/EsPercolateBolt.java --- @@ -0,0 +1,81 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.storm.elasticsearch.bolt; + +import backtype.storm.task.OutputCollector; +import backtype.storm.task.TopologyContext; +import backtype.storm.topology.OutputFieldsDeclarer; +import backtype.storm.tuple.Fields; +import backtype.storm.tuple.Tuple; +import backtype.storm.tuple.Values; +import org.apache.storm.elasticsearch.common.EsConfig; +import org.elasticsearch.action.percolate.PercolateResponse; +import org.elasticsearch.action.percolate.PercolateSourceBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; + +public class EsPercolateBolt extends AbstractEsBolt { +private static final Logger LOG = LoggerFactory.getLogger(EsPercolateBolt.class); + +/** + * EsPercolateBolt constructor + * @param esConfig Elasticsearch configuration containing node addresses and cluster name {@link EsConfig} + */ +public EsPercolateBolt(EsConfig esConfig) { +super(esConfig); +} + +@Override +public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) { +super.prepare(map, topologyContext, outputCollector); +} + +/** + * Executes percolate request for given tuple. + * @param tuple should contain string values of 3 declared fields: source, index, type + */ +@Override +public void execute(Tuple tuple) { +try { +String source = tuple.getStringByField(source); +String index = tuple.getStringByField(index); +String type = tuple.getStringByField(type); + +PercolateResponse response = client.preparePercolate().setIndices(index).setDocumentType(type) + .setPercolateDoc(PercolateSourceBuilder.docBuilder().setDoc(source)).execute().actionGet(); +if (response.getCount() 0) { +for (PercolateResponse.Match match : response) { +String id = match.getId().toString(); +collector.emit(new Values(id)); +} +} +collector.ack(tuple); +} catch (Exception e) { +e.printStackTrace(); --- End diff -- Please use LOG instead of printing to stderr. (Line 71) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639779#comment-14639779 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388900 --- Diff: external/storm-elasticsearch/src/main/java/org/apache/storm/elasticsearch/bolt/EsIndexBolt.java --- @@ -0,0 +1,71 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.storm.elasticsearch.bolt; + +import backtype.storm.task.OutputCollector; +import backtype.storm.task.TopologyContext; +import backtype.storm.topology.OutputFieldsDeclarer; +import backtype.storm.tuple.Tuple; +import org.apache.storm.elasticsearch.common.EsConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; + +public class EsIndexBolt extends AbstractEsBolt { +private static final Logger LOG = LoggerFactory.getLogger(EsIndexBolt.class); + +/** + * EsIndexBolt constructor + * @param esConfig Elasticsearch configuration containing node addresses and cluster name {@link EsConfig} + */ +public EsIndexBolt(EsConfig esConfig) { +super(esConfig); +} + +@Override +public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) { +super.prepare(map, topologyContext, outputCollector); +} + +/** + * Executes index request for given tuple. + * @param tuple should contain string values of 4 declared fields: source, index, type, id + */ +@Override +public void execute(Tuple tuple) { +try { +String source = tuple.getStringByField(source); +String index = tuple.getStringByField(index); +String type = tuple.getStringByField(type); +String id = tuple.getStringByField(id); + +client.prepareIndex(index, type, id).setSource(source).execute().actionGet(); +collector.ack(tuple); +} catch (Exception e) { +e.printStackTrace(); --- End diff -- Please use LOG instead of printing to stderr. Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388973 --- Diff: external/storm-elasticsearch/src/main/java/org/apache/storm/elasticsearch/trident/EsStateFactory.java --- @@ -0,0 +1,51 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.storm.elasticsearch.trident; + +import backtype.storm.task.IMetricsContext; +import org.apache.storm.elasticsearch.common.EsConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import storm.trident.state.State; +import storm.trident.state.StateFactory; + +import java.util.Map; + +public class EsStateFactory implements StateFactory { +private static final Logger LOG = LoggerFactory.getLogger(EsStateFactory.class); --- End diff -- LOG = redundency, can be removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639784#comment-14639784 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388973 --- Diff: external/storm-elasticsearch/src/main/java/org/apache/storm/elasticsearch/trident/EsStateFactory.java --- @@ -0,0 +1,51 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.storm.elasticsearch.trident; + +import backtype.storm.task.IMetricsContext; +import org.apache.storm.elasticsearch.common.EsConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import storm.trident.state.State; +import storm.trident.state.StateFactory; + +import java.util.Map; + +public class EsStateFactory implements StateFactory { +private static final Logger LOG = LoggerFactory.getLogger(EsStateFactory.class); --- End diff -- LOG = redundency, can be removed Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-959:remove unnecessary dependency from s...
Github user caofangkun commented on the pull request: https://github.com/apache/storm/pull/652#issuecomment-124295316 duplicates #609 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-594) Auto-Scaling Resources in a Topology
[ https://issues.apache.org/jira/browse/STORM-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639827#comment-14639827 ] Memory Lake commented on STORM-594: --- Hi Jon, what do you mean by using latency*events/sec? It seems to be the same thing as capacity in my understanding. You also mentioned that each worker has the same number and type of bolts and spouts, are you suggesting that every bolt should have a parallel degree that is larger than or even a multiple of the number of workers? Auto-Scaling Resources in a Topology Key: STORM-594 URL: https://issues.apache.org/jira/browse/STORM-594 Project: Apache Storm Issue Type: New Feature Reporter: HARSHA BALASUBRAMANIAN Assignee: Pooyan Jamshidi Priority: Minor Attachments: Algorithm for Auto-Scaling.pdf, Project Plan and Scope.pdf Original Estimate: 504h Remaining Estimate: 504h A useful feature missing in Storm topologies is the ability to auto-scale resources, based on a pre-configured metric. The feature proposed here aims to build such a auto-scaling mechanism using a feedback system. A brief overview of the feature is provided here. The finer details of the required components and the scaling algorithm (uses a Feedback System) are provided in the PDFs attached. Brief Overview: Topologies may get created with or (ideally) without parallelism hints and tasks in their bolts and spouts, before submitting them, If auto-scaling is set in the topology (using a Boolean flag), the topology will also get submitted to the auto-scale module. The auto-scale module will read a pre-configured metric (threshold/min) from a configuration file. Using this value, the topology's resources will be modified till the threshold is reached. At each stage in the auto-scale module's execution, feedback from the previous execution will be used to tune the resources. The systems that need to be in place to achieve this are: 1. Metrics which provide the current threshold (no: of acks per minute) for a topology's spouts and bolts. 2. Access to Storm's CLI tool which can change a topology's resources are runtime. 3. A new java or clojure module which runs within the Nimbus daemon or in parallel to it. This will be the auto-scale module. Limitations: (This is not an exhaustive list. More will be added as the design matures. Also, some of the points here may get resolved) To test the feature there will be a number of limitations in the first release. As the feature matures, it will be allowed to scale more 1. The auto-scale module will be limited to a few topologies (maybe 4 or 5 at maximum) 2. New bolts will not be added to scale a topology. This feature will be limited to increasing the resources within the existing topology. 3. Topology resources will not be decreased when it is running at more than the required number (except for a few cases) 4. This feature will work only for long-running topologies where the input threshold can become equal to or greater than the required threshold -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35388900 --- Diff: external/storm-elasticsearch/src/main/java/org/apache/storm/elasticsearch/bolt/EsIndexBolt.java --- @@ -0,0 +1,71 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.storm.elasticsearch.bolt; + +import backtype.storm.task.OutputCollector; +import backtype.storm.task.TopologyContext; +import backtype.storm.topology.OutputFieldsDeclarer; +import backtype.storm.tuple.Tuple; +import org.apache.storm.elasticsearch.common.EsConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; + +public class EsIndexBolt extends AbstractEsBolt { +private static final Logger LOG = LoggerFactory.getLogger(EsIndexBolt.class); + +/** + * EsIndexBolt constructor + * @param esConfig Elasticsearch configuration containing node addresses and cluster name {@link EsConfig} + */ +public EsIndexBolt(EsConfig esConfig) { +super(esConfig); +} + +@Override +public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) { +super.prepare(map, topologyContext, outputCollector); +} + +/** + * Executes index request for given tuple. + * @param tuple should contain string values of 4 declared fields: source, index, type, id + */ +@Override +public void execute(Tuple tuple) { +try { +String source = tuple.getStringByField(source); +String index = tuple.getStringByField(index); +String type = tuple.getStringByField(type); +String id = tuple.getStringByField(id); + +client.prepareIndex(index, type, id).setSource(source).execute().actionGet(); +collector.ack(tuple); +} catch (Exception e) { +e.printStackTrace(); --- End diff -- Please use LOG instead of printing to stderr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-297) Storm Performance cannot be scaled up by adding more CPU cores
[ https://issues.apache.org/jira/browse/STORM-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639778#comment-14639778 ] Manu Zhang commented on STORM-297: -- [~jia.fu] This jira is resolved. You can get help on the user or dev list with detailed description of your issue. Storm Performance cannot be scaled up by adding more CPU cores -- Key: STORM-297 URL: https://issues.apache.org/jira/browse/STORM-297 Project: Apache Storm Issue Type: Bug Reporter: Sean Zhong Assignee: Sean Zhong Labels: Performance, netty Fix For: 0.9.2-incubating Attachments: Storm_performance_fix.pdf, storm_Netty_receiver_diagram.png, storm_conf.txt, storm_performance_fix.patch, worker_throughput_without_storm-297.png We cannot scale up the performance by adding more CPU cores and increasing parallelism. For a 2 layer topology Spout ---shuffle grouping-- bolt, when message size is small (around 100 bytes), we can find in the below picture that neither the CPU nor the network is saturated. When message size is 100 bytes, only 40% of CPU is used, only 18% of network is used, although we have a high parallelism (overall we have 144 executors) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639786#comment-14639786 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35389019 --- Diff: external/storm-elasticsearch/src/main/java/org/apache/storm/elasticsearch/bolt/EsPercolateBolt.java --- @@ -0,0 +1,81 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.storm.elasticsearch.bolt; + +import backtype.storm.task.OutputCollector; +import backtype.storm.task.TopologyContext; +import backtype.storm.topology.OutputFieldsDeclarer; +import backtype.storm.tuple.Fields; +import backtype.storm.tuple.Tuple; +import backtype.storm.tuple.Values; +import org.apache.storm.elasticsearch.common.EsConfig; +import org.elasticsearch.action.percolate.PercolateResponse; +import org.elasticsearch.action.percolate.PercolateSourceBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; + +public class EsPercolateBolt extends AbstractEsBolt { +private static final Logger LOG = LoggerFactory.getLogger(EsPercolateBolt.class); + +/** + * EsPercolateBolt constructor + * @param esConfig Elasticsearch configuration containing node addresses and cluster name {@link EsConfig} + */ +public EsPercolateBolt(EsConfig esConfig) { +super(esConfig); +} + +@Override +public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) { +super.prepare(map, topologyContext, outputCollector); +} + +/** + * Executes percolate request for given tuple. + * @param tuple should contain string values of 3 declared fields: source, index, type + */ +@Override +public void execute(Tuple tuple) { +try { +String source = tuple.getStringByField(source); +String index = tuple.getStringByField(index); +String type = tuple.getStringByField(type); + +PercolateResponse response = client.preparePercolate().setIndices(index).setDocumentType(type) + .setPercolateDoc(PercolateSourceBuilder.docBuilder().setDoc(source)).execute().actionGet(); +if (response.getCount() 0) { +for (PercolateResponse.Match match : response) { +String id = match.getId().toString(); +collector.emit(new Values(id)); --- End diff -- I'm curious that emitting only query id list covers general use case. If users need to know query's information, they should write codes to retrieve it. If PercolateResponse.Match can be serialized, I think it's better to emit it. Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-845 Storm ElasticSearch connector
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/storm/pull/573#discussion_r35389019 --- Diff: external/storm-elasticsearch/src/main/java/org/apache/storm/elasticsearch/bolt/EsPercolateBolt.java --- @@ -0,0 +1,81 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.storm.elasticsearch.bolt; + +import backtype.storm.task.OutputCollector; +import backtype.storm.task.TopologyContext; +import backtype.storm.topology.OutputFieldsDeclarer; +import backtype.storm.tuple.Fields; +import backtype.storm.tuple.Tuple; +import backtype.storm.tuple.Values; +import org.apache.storm.elasticsearch.common.EsConfig; +import org.elasticsearch.action.percolate.PercolateResponse; +import org.elasticsearch.action.percolate.PercolateSourceBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; + +public class EsPercolateBolt extends AbstractEsBolt { +private static final Logger LOG = LoggerFactory.getLogger(EsPercolateBolt.class); + +/** + * EsPercolateBolt constructor + * @param esConfig Elasticsearch configuration containing node addresses and cluster name {@link EsConfig} + */ +public EsPercolateBolt(EsConfig esConfig) { +super(esConfig); +} + +@Override +public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) { +super.prepare(map, topologyContext, outputCollector); +} + +/** + * Executes percolate request for given tuple. + * @param tuple should contain string values of 3 declared fields: source, index, type + */ +@Override +public void execute(Tuple tuple) { +try { +String source = tuple.getStringByField(source); +String index = tuple.getStringByField(index); +String type = tuple.getStringByField(type); + +PercolateResponse response = client.preparePercolate().setIndices(index).setDocumentType(type) + .setPercolateDoc(PercolateSourceBuilder.docBuilder().setDoc(source)).execute().actionGet(); +if (response.getCount() 0) { +for (PercolateResponse.Match match : response) { +String id = match.getId().toString(); +collector.emit(new Values(id)); --- End diff -- I'm curious that emitting only query id list covers general use case. If users need to know query's information, they should write codes to retrieve it. If PercolateResponse.Match can be serialized, I think it's better to emit it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Storm and Firewalls
Will you explain to me what is happening? What protocols / ports / port ranges does Storm use? I am setting up external firewalls for my storm clusters. I went to the default yaml, looking for port #'s / settings, looked at the configs on my clusters and came up with a list of ports and protocols. When I went to verify, I found ports open that I didn't expect. Below, I am listing the ports that I did not expect / do not understand. = sudo lsof | head -n 1 sudo lsof | grep TCP | grep {PID} COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME nimbus node java 1324 root 57u IPv6 10486 0t0TCP *:58607 (LISTEN) java 1324 root 59u IPv6 10488 0t0TCP *:47473 (LISTEN) worker node 2 java650root 58u IPv61725402 0t0TCP *:58284 (LISTEN) java650root 60u IPv61725404 0t0TCP *:55041 (LISTEN) worker node 3 java 2094root 58u IPv61735153 0t0TCP *:59384 (LISTEN) java 2094root 60u IPv61735155 0t0TCP *:49499 (LISTEN) worker node 4 java 6278root 58u IPv61763570 0t0TCP *:40328 (LISTEN) java 6278root 60u IPv61763572 0t0TCP *:42887 (LISTEN) = Will you explain to me what is happening? What protocols / ports / port ranges does Storm use? + Jeff Maass maas...@gmail.com linkedin.com/in/jeffmaass stackoverflow.com/users/373418/maassql +
[jira] [Commented] (STORM-845) Storm ElasticSearch connector
[ https://issues.apache.org/jira/browse/STORM-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638265#comment-14638265 ] ASF GitHub Bot commented on STORM-845: -- Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/573#issuecomment-123992565 @sweetest Good. I also think supporting ES is a valuable feature since many users already enjoys with ES + Kibana. And at a glimpse EsSpout and EsBolt contains only very basic feature so we may be eager for having more powerful ES + Storm connector. I'd also like to volunteer to be a sponsor this module. I'll take a look and comment later. Storm ElasticSearch connector - Key: STORM-845 URL: https://issues.apache.org/jira/browse/STORM-845 Project: Apache Storm Issue Type: New Feature Reporter: Adrian Seungjin Lee Assignee: Adrian Seungjin Lee It would be nice to provide storm driver for elasticsearch, just like it does for hive, redis and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-960) Hive-Bolt can lose tuples when flushing data
[ https://issues.apache.org/jira/browse/STORM-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638754#comment-14638754 ] ASF GitHub Bot commented on STORM-960: -- GitHub user dossett opened a pull request: https://github.com/apache/storm/pull/653 STORM-960 HiveBolt should ack tuples only after flushing You can merge this pull request into a Git repository by running: $ git pull https://github.com/dossett/storm HiveBoltAck Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/653.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #653 commit aac496ade94210888a0ca473df648855c408496b Author: Sriharsha Chintalapani m...@harsha.io Date: 2015-07-21T22:42:22Z STORM-951. Storm Hive connector leaking connections. commit 6a5d0dcb44b58063035bfdaac4ebddba401dc914 Author: Aaron Dossett aaron.doss...@target.com Date: 2015-07-23T12:50:25Z STORM 960: HiveBolt should only ack after succesful flush Hive-Bolt can lose tuples when flushing data Key: STORM-960 URL: https://issues.apache.org/jira/browse/STORM-960 Project: Apache Storm Issue Type: Improvement Components: external Reporter: Aaron Dossett Priority: Minor In HiveBolt's execute method tuples are ack'd as they are received. When a batchsize of tuples has been received, the writers are flushed. However, if the flush fails only the most recent tuple will be marked as failed. All prior tuples will already have been ack'd. This creates a window for data loss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-960 HiveBolt should ack tuples only afte...
GitHub user dossett opened a pull request: https://github.com/apache/storm/pull/653 STORM-960 HiveBolt should ack tuples only after flushing You can merge this pull request into a Git repository by running: $ git pull https://github.com/dossett/storm HiveBoltAck Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/653.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #653 commit aac496ade94210888a0ca473df648855c408496b Author: Sriharsha Chintalapani m...@harsha.io Date: 2015-07-21T22:42:22Z STORM-951. Storm Hive connector leaking connections. commit 6a5d0dcb44b58063035bfdaac4ebddba401dc914 Author: Aaron Dossett aaron.doss...@target.com Date: 2015-07-23T12:50:25Z STORM 960: HiveBolt should only ack after succesful flush --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (STORM-960) Hive-Bolt can lose tuples when flushing data
[ https://issues.apache.org/jira/browse/STORM-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Dossett reassigned STORM-960: --- Assignee: Aaron Dossett Hive-Bolt can lose tuples when flushing data Key: STORM-960 URL: https://issues.apache.org/jira/browse/STORM-960 Project: Apache Storm Issue Type: Improvement Components: external Reporter: Aaron Dossett Assignee: Aaron Dossett Priority: Minor In HiveBolt's execute method tuples are ack'd as they are received. When a batchsize of tuples has been received, the writers are flushed. However, if the flush fails only the most recent tuple will be marked as failed. All prior tuples will already have been ack'd. This creates a window for data loss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-960) Hive-Bolt can lose tuples when flushing data
[ https://issues.apache.org/jira/browse/STORM-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638755#comment-14638755 ] Aaron Dossett commented on STORM-960: - I realize this PR also contains the commit for STORM-951, which I was testing with. I can remove the STORM-951 commit if necessary. Hive-Bolt can lose tuples when flushing data Key: STORM-960 URL: https://issues.apache.org/jira/browse/STORM-960 Project: Apache Storm Issue Type: Improvement Components: external Reporter: Aaron Dossett Assignee: Aaron Dossett Priority: Minor In HiveBolt's execute method tuples are ack'd as they are received. When a batchsize of tuples has been received, the writers are flushed. However, if the flush fails only the most recent tuple will be marked as failed. All prior tuples will already have been ack'd. This creates a window for data loss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: Eventhub spout meta data
GitHub user tandrup opened a pull request: https://github.com/apache/storm/pull/651 Eventhub spout meta data Add partition,seq-number fields to emitted tupples for event sourcing consumers, which need to preserve partition order and be able to replay. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tandrup/storm eventhub-spout-meta-data Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/651.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #651 commit d5a7f6081ed65716a6c6fe0121b19597b7d3fc00 Author: rsltrifork r...@trifork.com Date: 2015-07-17T11:27:40Z Add partition,seq-number fields to emitted tupples for event sourcing consumers, which need to preserve partition order and be able to replay. Signed-off-by: Mads Mætzke Tandrup m...@maetzke-tandrup.dk commit aebe26013792d95caecd9deb285b9fba7df730bd Author: Mads Mætzke Tandrup m...@maetzke-tandrup.dk Date: 2015-07-23T07:43:16Z Aligning naming of sequence number --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (STORM-959) remove unnecessary dependency from storm-hive/pom.xml
caofangkun created STORM-959: Summary: remove unnecessary dependency from storm-hive/pom.xml Key: STORM-959 URL: https://issues.apache.org/jira/browse/STORM-959 Project: Apache Storm Issue Type: Bug Components: external Affects Versions: 0.11.0 Reporter: caofangkun Assignee: caofangkun Priority: Minor [org.apache.calcite:calcite-core:0.9.2-incubating|https://github.com/apache/storm/blob/master/external/storm-hive/pom.xml#L84] does not take affect at all. Becase of hive-exec org.apache.calcite:calcite-core:0.9.2-incubating-SNAPSHOT will be downloaded {code} Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/maven-metadata.xml Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/calcite-core-0.9.2-incubating-SNAPSHOT.pom [WARNING] The POM for org.apache.calcite:calcite-core:jar:0.9.2-incubating-SNAPSHOT is missing, no dependency information available {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-959:remove unnecessary dependency from s...
Github user caofangkun commented on the pull request: https://github.com/apache/storm/pull/652#issuecomment-124034146 @harshach Could you please have a look at this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: STORM-959:remove unnecessary dependency from s...
GitHub user caofangkun opened a pull request: https://github.com/apache/storm/pull/652 STORM-959:remove unnecessary dependency from storm-hive/pom.xml org.apache.calcite:calcite-core:0.9.2-incubating does not take affect at all. You can merge this pull request into a Git repository by running: $ git pull https://github.com/caofangkun/apache-storm storm-959 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/652.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #652 commit cd8bdfed00bf6ef6eb8322c2e7be2a32d377235c Author: caofangkun caofang...@gmail.com Date: 2015-07-23T09:32:48Z STORM-959:remove unnecessary dependency from storm-hive/pom.xml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-959) remove unnecessary dependency from storm-hive/pom.xml
[ https://issues.apache.org/jira/browse/STORM-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638555#comment-14638555 ] ASF GitHub Bot commented on STORM-959: -- GitHub user caofangkun opened a pull request: https://github.com/apache/storm/pull/652 STORM-959:remove unnecessary dependency from storm-hive/pom.xml org.apache.calcite:calcite-core:0.9.2-incubating does not take affect at all. You can merge this pull request into a Git repository by running: $ git pull https://github.com/caofangkun/apache-storm storm-959 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/652.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #652 commit cd8bdfed00bf6ef6eb8322c2e7be2a32d377235c Author: caofangkun caofang...@gmail.com Date: 2015-07-23T09:32:48Z STORM-959:remove unnecessary dependency from storm-hive/pom.xml remove unnecessary dependency from storm-hive/pom.xml - Key: STORM-959 URL: https://issues.apache.org/jira/browse/STORM-959 Project: Apache Storm Issue Type: Bug Components: external Affects Versions: 0.11.0 Reporter: caofangkun Assignee: caofangkun Priority: Minor [org.apache.calcite:calcite-core:0.9.2-incubating|https://github.com/apache/storm/blob/master/external/storm-hive/pom.xml#L84] does not take affect at all. Becase of hive-exec org.apache.calcite:calcite-core:0.9.2-incubating-SNAPSHOT will be downloaded {code} Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/maven-metadata.xml Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/calcite-core-0.9.2-incubating-SNAPSHOT.pom [WARNING] The POM for org.apache.calcite:calcite-core:jar:0.9.2-incubating-SNAPSHOT is missing, no dependency information available {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-959) remove unnecessary dependency from storm-hive/pom.xml
[ https://issues.apache.org/jira/browse/STORM-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638558#comment-14638558 ] ASF GitHub Bot commented on STORM-959: -- Github user caofangkun commented on the pull request: https://github.com/apache/storm/pull/652#issuecomment-124034146 @harshach Could you please have a look at this PR? remove unnecessary dependency from storm-hive/pom.xml - Key: STORM-959 URL: https://issues.apache.org/jira/browse/STORM-959 Project: Apache Storm Issue Type: Bug Components: external Affects Versions: 0.11.0 Reporter: caofangkun Assignee: caofangkun Priority: Minor [org.apache.calcite:calcite-core:0.9.2-incubating|https://github.com/apache/storm/blob/master/external/storm-hive/pom.xml#L84] does not take affect at all. Becase of hive-exec org.apache.calcite:calcite-core:0.9.2-incubating-SNAPSHOT will be downloaded {code} Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/maven-metadata.xml Downloading: https://repository.apache.org/snapshots/org/apache/calcite/calcite-core/0.9.2-incubating-SNAPSHOT/calcite-core-0.9.2-incubating-SNAPSHOT.pom [WARNING] The POM for org.apache.calcite:calcite-core:jar:0.9.2-incubating-SNAPSHOT is missing, no dependency information available {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)