[GitHub] storm pull request #1770: STORM 2197: NimbusClient connections leak due to T...
Github user asfgit closed the pull request at: https://github.com/apache/storm/pull/1770 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request #1771: STORM-2197: NimbusClient connections leak due to l...
Github user asfgit closed the pull request at: https://github.com/apache/storm/pull/1771 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1751 We might need to change the delimiter to ';', not '\n' since with Avro schema definition it becomes really hard to edit. (It was hard to edit indeed...) Maybe need to initiate discussion around this from dev@. I'll handle this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1751 @vesense I got an exception while executing topology: ``` 21:12:44.247 [main] INFO o.a.s.s.r.DataSourcesRegistry - Registering scheme kafka with org.apache.storm.sql.kafka.KafkaDataSourcesProvider@338c99c8 Exception in thread "main" java.lang.IllegalStateException: Bolt 'b-0-LOGICALFILTER_6-LOGICALPROJECT_7' contains a non-serializable field of type org.apache.avro.Schema$RecordSchema, which was instantiated prior to topology creation. org.apache.avro.Schema$RecordSchema should be instantiated within the prepare method of 'b-0-LOGICALFILTER_6-LOGICALPROJECT_7 at the earliest. at org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:127) at org.apache.storm.trident.topology.TridentTopologyBuilder.buildTopology(TridentTopologyBuilder.java:265) at org.apache.storm.trident.TridentTopology.build(TridentTopology.java:529) at org.apache.storm.sql.StormSqlImpl.submit(StormSqlImpl.java:134) at org.apache.storm.sql.StormSqlRunner.main(StormSqlRunner.java:63) Caused by: java.lang.RuntimeException: java.io.NotSerializableException: org.apache.avro.Schema$RecordSchema at org.apache.storm.utils.Utils.javaSerialize(Utils.java:235) at org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:122) ... 4 more Caused by: java.io.NotSerializableException: org.apache.avro.Schema$RecordSchema at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) ... ``` Maybe `schema` shouldn't be parsed from constructor. (We could use it for verification but shouldn't store it.) Instead, you can initialize for the first time of write() or deserialize(), and reuse it after calling. Please let me know if your patch works well with SQL runner with remote cluster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1751 Just an idea: We might even infer Avro schema from SQL table column information. The thing is associating between Avro type and SQL type. https://github.com/databricks/spark-avro seems to do similar. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format
Github user vesense commented on the issue: https://github.com/apache/storm/pull/1751 @HeartSaVioR Oh, the `NotSerializableException` is because that I removed `CachedSchemas` in previous commit. Sorry for that, fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1751 Read JSON and calculate & filter, and store to AVRO: ``` CREATE EXTERNAL TABLE ORDERS (ID INT PRIMARY KEY, UNIT_PRICE INT, QUANTITY INT) STORED AS INPUTFORMAT 'org.apache.storm.sql.runtime.serde.json.JsonScheme' OUTPUTFORMAT 'org.apache.storm.sql.runtime.serde.json.JsonSerializer' LOCATION 'kafka://localhost:2181/brokers?topic=orders' TBLPROPERTIES '{ "producer": { "bootstrap.servers": "localhost:9092", "acks": "1", "key.serializer": "org.apache.storm.kafka.IntSerializer", "value.serializer": "org.apache.storm.kafka.ByteBufferSerializer" } }' CREATE EXTERNAL TABLE LARGE_ORDERS (ID INT PRIMARY KEY, TOTAL INT) STORED AS INPUTFORMAT 'org.apache.storm.sql.runtime.serde.avro.AvroScheme' OUTPUTFORMAT 'org.apache.storm.sql.runtime.serde.avro.AvroSerializer' LOCATION 'kafka://localhost:2181/brokers?topic=large_orders' TBLPROPERTIES ' { "producer": { "bootstrap.servers": "localhost:9092", "acks": "1", "key.serializer": "org.apache.storm.kafka.IntSerializer", "value.serializer": "org.apache.storm.kafka.ByteBufferSerializer" }, "input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", \"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", \"type\": \"int\"} ]} ", "output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", \"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", \"type\": \"int\"} ]} "}' ``` Read AVRO and store to JSON: ``` CREATE EXTERNAL TABLE LARGE_ORDERS (ID INT PRIMARY KEY, TOTAL INT) STORED AS INPUTFORMAT 'org.apache.storm.sql.runtime.serde.avro.AvroScheme' OUTPUTFORMAT 'org.apache.storm.sql.runtime.serde.avro.AvroSerializer' LOCATION 'kafka://localhost:2181/brokers?topic=large_orders' TBLPROPERTIES ' { "producer": { "bootstrap.servers": "localhost:9092", "acks": "1", "key.serializer": "org.apache.storm.kafka.IntSerializer", "value.serializer": "org.apache.storm.kafka.ByteBufferSerializer" }, "input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", \"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", \"type\": \"int\"} ]} ", "output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", \"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", \"type\": \"int\"} ]} "}' CREATE EXTERNAL TABLE LARGE_ORDERS_JSON (ID INT PRIMARY KEY, TOTAL INT) STORED AS INPUTFORMAT 'org.apache.storm.sql.runtime.serde.json.JsonScheme' OUTPUTFORMAT 'org.apache.storm.sql.runtime.serde.json.JsonSerializer' LOCATION 'kafka://localhost:2181/brokers?topic=large_orders_json' TBLPROPERTIES '{ "producer": { "bootstrap.servers": "localhost:9092", "acks": "1", "key.serializer": "org.apache.storm.kafka.IntSerializer", "value.serializer": "org.apache.storm.kafka.ByteBufferSerializer" } }' INSERT INTO LARGE_ORDERS_JSON SELECT ID, TOTAL FROM LARGE_ORDERS ``` Manual test succeed. +1 Thanks for the great work @vesense --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format
Github user vesense commented on the issue: https://github.com/apache/storm/pull/1751 Yes, I also find that the sql with Avro schema string is a little terrible. it's not easy to edit for users. maybe I can take time to improve later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format
Github user vesense commented on the issue: https://github.com/apache/storm/pull/1751 @HeartSaVioR Thanks for your patience. After this PR getting merged, I will modify the PR for TSV/CSV format ASAP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1751 @vesense Could you craft a patch for 1.x branch? It should be compiled with JDK 1.7. This is build failure with JDK 1.7 on 1.x-branch & this patch. ``` [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/jlim/WorkArea/JavaProjects/storm/external/sql/storm-sql-runtime/src/test/org/apache/storm/sql/TestAvroSerializer.java:[47,40] incompatible types required: java.util.List found: java.util.ArrayList>> [INFO] 1 error [INFO] - [INFO] [INFO] Reactor Summary: [INFO] [INFO] storm-sql-runtime .. FAILURE [ 2.960 s] [INFO] storm-sql-core . SKIPPED [INFO] storm-sql-kafka SKIPPED [INFO] storm-sql-redis SKIPPED [INFO] storm-sql-mongodb .. SKIPPED [INFO] sql SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 3.844 s [INFO] Finished at: 2016-11-11T23:11:46+09:00 [INFO] Final Memory: 35M/452M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile (default-testCompile) on project storm-sql-runtime: Compilation failure [ERROR] /Users/jlim/WorkArea/JavaProjects/storm/external/sql/storm-sql-runtime/src/test/org/apache/storm/sql/TestAvroSerializer.java:[47,40] incompatible types [ERROR] required: java.util.List [ERROR] found: java.util.ArrayList>> [ERROR] -> [Help 1] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request #1751: [STORM-2172][SQL] Support Avro as input / output f...
Github user asfgit closed the pull request at: https://github.com/apache/storm/pull/1751 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format
Github user vesense commented on the issue: https://github.com/apache/storm/pull/1751 @HeartSaVioR OK. I will create a PR for 1.x-branch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request #1774: [STORM-2172][SQL][1.x branch] Support Avro as inpu...
GitHub user vesense opened a pull request: https://github.com/apache/storm/pull/1774 [STORM-2172][SQL][1.x branch] Support Avro as input / output format You can merge this pull request into a Git repository by running: $ git pull https://github.com/vesense/storm STORM-2172-1.x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/1774.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1774 commit ada5a5b34cd908ee7bb329b17d8b8c2389ffc4c7 Author: Xin Wang Date: 2016-11-11T15:12:31Z [STORM-2172][SQL] Support Avro as input / output format commit 8e9d4c37b2f8e0c9665cad090d487dc68986575f Author: Xin Wang Date: 2016-11-11T15:17:01Z [STORM-2172][SQL] version fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [DISCUSS] breaking changes in 2.x
Windows remains perpetually backwards compatible. Even to the point that windows ships with older broken versions of internal libraries so if it detects specific software and load up the old version as needed. Mac usually provides an upgrade path and will allow apps using up to date APIs from the previous version of the OS to run on the new version unchanged. But if you are using a deprecated API you have to change before the next version is released of you will be in trouble, and even some non-deprecated APIs can change at a moments notice. The Linux kernel maintains strict compatibility with user space like windows, which is why docker can work, but with break kernel modules without too much concern. The GNU user space, however breaks binary compatibility between releases all the time, but maintains source compatibility (just recompile). Hadoop will break things between major releases but not between minor releases. There is no guarantee of a rolling upgrade between major releases. Which is partly why they are just starting to move towards 3.x and have multiple different flavors of 2.x lines alive. And then there is guava where they just don't care. There are pros and cons to all of these. I thought initially that we had agreed on a model like Hadoop, although truthfully I don't think we ever formalized any of that, and that is why I started this chain. I really see value, however, in the Mac model. And since I can maintain compatibility, but it is a little painful to do so, I will try to do that. Right now, honestly, I think 2.x could be a rolling upgrade from 1.x, so I will try to maintain that. We may hit a feature where it just will not be possible to do that, but we should discuss that when it happens. - Bobby On Thursday, November 10, 2016, 3:06:41 AM CST, Kyle Nusbaum wrote:On Wednesday, November 9, 2016, 7:23:09 AM CST, Harsha Chintalapani wrote:> If we want users to upgrade to new version, the rolling upgrade is a major > decision factor. As a community, we need to look API updates or breaking > changes much more diligently. Within a major version, I agree. APIs should be as stable as possible within a version release. > I agree to an extent we shouldn't limiting ourselves with rolling upgrade. > But having announced rolling-upgrade in 0.10 and then not supporting it in > 1.x and now in 2.x. In User's point of view, Storm is not rolling > upgradable although we shipped a release stating that rolling upgrade is > supported and in follow-up release we taken that off. The user would be correct. Storm would not be rolling-upgradable *between major versions.*I don't see how it's possible to develop and improve a project if it must remain perpetually backwards compatible, so I think it's necessary to reject compatibility as a *primary* goal. Eventually (hopefully) we'll arrive at an API that we're happy with that we don't feel like we need to change.Then we can claim rolling upgrades across major version numbers. > Does these API changes are critical and worth breaking rolling upgrade? My position is that we don't want to limit ourselves to "critical" API changes. This will stick us with an inferior API that we can't evolve.It's accepting the long-term pain of inconsistent API or old baggage to avoid the short-term pain of relaunching or updating topologies when you do a major version upgrade. Storm is not at the place in its life where it has stopped evolving, and I don't want to stifle its development.
[GitHub] storm issue #1754: [STORM-2177][STORM-2173][SQL] Support TSV/CSV as input / ...
Github user vesense commented on the issue: https://github.com/apache/storm/pull/1754 The code is ready for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---