[GitHub] storm pull request #1770: STORM 2197: NimbusClient connections leak due to T...

2016-11-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1770


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1771: STORM-2197: NimbusClient connections leak due to l...

2016-11-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1771


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format

2016-11-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1751
  
We might need to change the delimiter to ';', not '\n' since with Avro 
schema definition it becomes really hard to edit. (It was hard to edit 
indeed...)
Maybe need to initiate discussion around this from dev@. I'll handle this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format

2016-11-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1751
  
@vesense 
I got an exception while executing topology:

```
21:12:44.247 [main] INFO  o.a.s.s.r.DataSourcesRegistry - Registering 
scheme kafka with org.apache.storm.sql.kafka.KafkaDataSourcesProvider@338c99c8
Exception in thread "main" java.lang.IllegalStateException: Bolt 
'b-0-LOGICALFILTER_6-LOGICALPROJECT_7' contains a non-serializable field of 
type org.apache.avro.Schema$RecordSchema, which was instantiated prior to 
topology creation. org.apache.avro.Schema$RecordSchema should be instantiated 
within the prepare method of 'b-0-LOGICALFILTER_6-LOGICALPROJECT_7 at the 
earliest.
at 
org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:127)
at 
org.apache.storm.trident.topology.TridentTopologyBuilder.buildTopology(TridentTopologyBuilder.java:265)
at 
org.apache.storm.trident.TridentTopology.build(TridentTopology.java:529)
at org.apache.storm.sql.StormSqlImpl.submit(StormSqlImpl.java:134)
at org.apache.storm.sql.StormSqlRunner.main(StormSqlRunner.java:63)
Caused by: java.lang.RuntimeException: java.io.NotSerializableException: 
org.apache.avro.Schema$RecordSchema
at org.apache.storm.utils.Utils.javaSerialize(Utils.java:235)
at 
org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:122)
... 4 more
Caused by: java.io.NotSerializableException: 
org.apache.avro.Schema$RecordSchema
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
...
```

Maybe `schema` shouldn't be parsed from constructor. (We could use it for 
verification but shouldn't store it.) 
Instead, you can initialize for the first time of write() or deserialize(), 
and reuse it after calling.

Please let me know if your patch works well with SQL runner with remote 
cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format

2016-11-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1751
  
Just an idea: We might even infer Avro schema from SQL table column 
information. The thing is associating between Avro type and SQL type. 
https://github.com/databricks/spark-avro seems to do similar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format

2016-11-11 Thread vesense
Github user vesense commented on the issue:

https://github.com/apache/storm/pull/1751
  
@HeartSaVioR Oh, the `NotSerializableException` is because that  I removed 
`CachedSchemas` in previous commit. Sorry for that, fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format

2016-11-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1751
  
Read JSON and calculate & filter, and store to AVRO:

```
CREATE EXTERNAL TABLE ORDERS (ID INT PRIMARY KEY, UNIT_PRICE INT, QUANTITY 
INT) STORED AS INPUTFORMAT 'org.apache.storm.sql.runtime.serde.json.JsonScheme' 
OUTPUTFORMAT 'org.apache.storm.sql.runtime.serde.json.JsonSerializer' LOCATION 
'kafka://localhost:2181/brokers?topic=orders' TBLPROPERTIES '{ "producer": { 
"bootstrap.servers": "localhost:9092", "acks": "1", "key.serializer": 
"org.apache.storm.kafka.IntSerializer", "value.serializer": 
"org.apache.storm.kafka.ByteBufferSerializer" } }'
CREATE EXTERNAL TABLE LARGE_ORDERS (ID INT PRIMARY KEY, TOTAL INT) STORED 
AS INPUTFORMAT 'org.apache.storm.sql.runtime.serde.avro.AvroScheme' 
OUTPUTFORMAT 'org.apache.storm.sql.runtime.serde.avro.AvroSerializer' LOCATION 
'kafka://localhost:2181/brokers?topic=large_orders' TBLPROPERTIES ' { 
"producer": { "bootstrap.servers": "localhost:9092", "acks": "1", 
"key.serializer": "org.apache.storm.kafka.IntSerializer", "value.serializer": 
"org.apache.storm.kafka.ByteBufferSerializer" }, "input.avro.schema": 
"{\"type\": \"record\", \"name\": \"large_orders\", \"fields\" : [ {\"name\": 
\"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", \"type\": \"int\"} ]} ", 
"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]} "}'
```

Read AVRO and store to JSON:

```
CREATE EXTERNAL TABLE LARGE_ORDERS (ID INT PRIMARY KEY, TOTAL INT) STORED 
AS INPUTFORMAT 'org.apache.storm.sql.runtime.serde.avro.AvroScheme' 
OUTPUTFORMAT 'org.apache.storm.sql.runtime.serde.avro.AvroSerializer' LOCATION 
'kafka://localhost:2181/brokers?topic=large_orders' TBLPROPERTIES ' { 
"producer": { "bootstrap.servers": "localhost:9092", "acks": "1", 
"key.serializer": "org.apache.storm.kafka.IntSerializer", "value.serializer": 
"org.apache.storm.kafka.ByteBufferSerializer" }, "input.avro.schema": 
"{\"type\": \"record\", \"name\": \"large_orders\", \"fields\" : [ {\"name\": 
\"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", \"type\": \"int\"} ]} ", 
"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]} "}'
CREATE EXTERNAL TABLE LARGE_ORDERS_JSON (ID INT PRIMARY KEY, TOTAL INT) 
STORED AS INPUTFORMAT 'org.apache.storm.sql.runtime.serde.json.JsonScheme' 
OUTPUTFORMAT 'org.apache.storm.sql.runtime.serde.json.JsonSerializer' LOCATION 
'kafka://localhost:2181/brokers?topic=large_orders_json' TBLPROPERTIES '{ 
"producer": { "bootstrap.servers": "localhost:9092", "acks": "1", 
"key.serializer": "org.apache.storm.kafka.IntSerializer", "value.serializer": 
"org.apache.storm.kafka.ByteBufferSerializer" } }'
INSERT INTO LARGE_ORDERS_JSON SELECT ID, TOTAL FROM LARGE_ORDERS
```

Manual test succeed. +1 
Thanks for the great work @vesense 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format

2016-11-11 Thread vesense
Github user vesense commented on the issue:

https://github.com/apache/storm/pull/1751
  
Yes, I also find that the sql with Avro schema string is a little terrible. 
it's not easy to edit for users. maybe I can take time to improve later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format

2016-11-11 Thread vesense
Github user vesense commented on the issue:

https://github.com/apache/storm/pull/1751
  
@HeartSaVioR Thanks for your patience.
After this PR getting merged, I will modify the PR for TSV/CSV format ASAP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format

2016-11-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/1751
  
@vesense Could you craft a patch for 1.x branch? It should be compiled with 
JDK 1.7.

This is build failure with JDK 1.7 on 1.x-branch & this patch.

```
[INFO] -
[ERROR] COMPILATION ERROR :
[INFO] -
[ERROR] 
/Users/jlim/WorkArea/JavaProjects/storm/external/sql/storm-sql-runtime/src/test/org/apache/storm/sql/TestAvroSerializer.java:[47,40]
 incompatible types
  required: java.util.List
  found:
java.util.ArrayList>>
[INFO] 1 error
[INFO] -
[INFO] 

[INFO] Reactor Summary:
[INFO]
[INFO] storm-sql-runtime .. FAILURE [  
2.960 s]
[INFO] storm-sql-core . SKIPPED
[INFO] storm-sql-kafka  SKIPPED
[INFO] storm-sql-redis  SKIPPED
[INFO] storm-sql-mongodb .. SKIPPED
[INFO] sql  SKIPPED
[INFO] 

[INFO] BUILD FAILURE
[INFO] 

[INFO] Total time: 3.844 s
[INFO] Finished at: 2016-11-11T23:11:46+09:00
[INFO] Final Memory: 35M/452M
[INFO] 

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project storm-sql-runtime: Compilation failure
[ERROR] 
/Users/jlim/WorkArea/JavaProjects/storm/external/sql/storm-sql-runtime/src/test/org/apache/storm/sql/TestAvroSerializer.java:[47,40]
 incompatible types
[ERROR] required: java.util.List
[ERROR] found:
java.util.ArrayList>>
[ERROR] -> [Help 1]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1751: [STORM-2172][SQL] Support Avro as input / output f...

2016-11-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1751


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1751: [STORM-2172][SQL] Support Avro as input / output format

2016-11-11 Thread vesense
Github user vesense commented on the issue:

https://github.com/apache/storm/pull/1751
  
@HeartSaVioR OK. I will create a PR for 1.x-branch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1774: [STORM-2172][SQL][1.x branch] Support Avro as inpu...

2016-11-11 Thread vesense
GitHub user vesense opened a pull request:

https://github.com/apache/storm/pull/1774

[STORM-2172][SQL][1.x branch] Support Avro as input / output format



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vesense/storm STORM-2172-1.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1774.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1774


commit ada5a5b34cd908ee7bb329b17d8b8c2389ffc4c7
Author: Xin Wang 
Date:   2016-11-11T15:12:31Z

[STORM-2172][SQL] Support Avro as input / output format

commit 8e9d4c37b2f8e0c9665cad090d487dc68986575f
Author: Xin Wang 
Date:   2016-11-11T15:17:01Z

[STORM-2172][SQL] version fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] breaking changes in 2.x

2016-11-11 Thread Bobby Evans
Windows remains perpetually backwards compatible.  Even to the point that 
windows ships with older broken versions of internal libraries so if it detects 
specific software and load up the old version as needed. 

Mac usually provides an upgrade path and will allow apps using up to date APIs 
from the previous version of the OS to run on the new version unchanged. But if 
you are using a deprecated API you have to change before the next version is 
released of you will be in trouble, and even some non-deprecated APIs can 
change at a moments notice.
The Linux kernel maintains strict compatibility with user space like windows, 
which is why docker can work, but with break kernel modules without too much 
concern.  The GNU user space, however breaks binary compatibility between 
releases all the time, but maintains source compatibility (just recompile).
Hadoop will break things between major releases but not between minor releases. 
 There is no guarantee of a rolling upgrade between major releases.  Which is 
partly why they are just starting to move towards 3.x and have multiple 
different flavors of 2.x lines alive.
And then there is guava where they just don't care.

There are pros and cons to all of these.  I thought initially that we had 
agreed on a model like Hadoop, although truthfully I don't think we ever 
formalized any of that, and that is why I started this chain.  I really see 
value, however, in the Mac model.  And since I can maintain compatibility, but 
it is a little painful to do so, I will try to do that. Right now, honestly, I 
think 2.x could be a rolling upgrade from 1.x, so I will try to maintain that.  
We may hit a feature where it just will not be possible to do that, but we 
should discuss that when it happens.

- Bobby

On Thursday, November 10, 2016, 3:06:41 AM CST, Kyle Nusbaum 
 wrote:On Wednesday, November 9, 2016, 7:23:09 
AM CST, Harsha Chintalapani  wrote:> If we want users to 
upgrade to new version, the rolling upgrade is a major
> decision factor. As a community, we need to look API updates or breaking
> changes much more diligently.
Within a major version, I agree. APIs should be as stable as possible within a 
version release.

> I agree to an extent we shouldn't limiting ourselves with rolling upgrade.
> But having announced rolling-upgrade in 0.10 and then not supporting it in
> 1.x and now in 2.x. In User's point of view, Storm is not rolling
> upgradable although we shipped a release stating that rolling upgrade is
> supported and in follow-up release we taken that off.
The user would be correct. Storm would not be rolling-upgradable *between major 
versions.*I don't see how it's possible to develop and improve a project if it 
must remain perpetually backwards compatible, so I think it's necessary to 
reject compatibility as a *primary* goal.
Eventually (hopefully) we'll arrive at an API that we're happy with that we 
don't feel like we need to change.Then we can claim rolling upgrades across 
major version numbers.

> Does these API changes are critical and worth breaking rolling upgrade?
My position is that we don't want to limit ourselves to "critical" API changes. 
This will stick us with an inferior API that we can't evolve.It's accepting the 
long-term pain of inconsistent API or old baggage to avoid the short-term pain 
of relaunching or updating topologies when you do a major version upgrade.
Storm is not at the place in its life where it has stopped evolving, and I 
don't want to stifle its development.


[GitHub] storm issue #1754: [STORM-2177][STORM-2173][SQL] Support TSV/CSV as input / ...

2016-11-11 Thread vesense
Github user vesense commented on the issue:

https://github.com/apache/storm/pull/1754
  
The code is ready for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---