[GitHub] incubator-metron issue #518: METRON-799: The MPack should function in a kerb...

2017-04-19 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/518
  
I've run this up on an EC2 cluster, and been able to get data from the core 
topologies from Kafka to ES/HDFS.  There are a couple caveats spelled out in 
the metron-deployment README now in a short Kerberos section mentioning that 
the mpack supports Ambari's Kerberization process.

The client process managed to ensure keytabs existed and that the 
client_jaas was created appropriately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #518: METRON-799: The MPack should function in a kerb...

2017-04-19 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/518
  
Update on this:

After a significant amount of pain involved in making this work, we need to 
have the client_jaas.conf and the metron keytab distributed to the various 
supervisor nodes so that they can actually be authenticated/authorized as the 
metron user.

The best discovered solution to this is to have a client that essentially 
just sets them up (so we can actually create things on the various supervisor 
servers).  It's not ideal, but @dlyle65535 and I are testing that it works.

To actually set this up, just install Metron clients on all Storm 
supervisor nodes prior to Kerberization.  Docs will be updated once we confirm 
that this approach works.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #532: METRON-634 Mpack bug fixes and improvements, no...

2017-04-17 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/532
  
This is great, thanks for cleaning a lot of this up.  When I get a chance, 
I look forward to spinning this up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #518: METRON-799: The MPack should function in a kerb...

2017-04-12 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/518
  
Updated to use builtin check for security in stack advisor.  Old versions 
caused problems because of typing in Python and my lack of catching what it 
actually does.  Params files should still be fine, because the config objects 
underneath the covers in Ambari explicitly convert strings to booleans if 
appropriate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #500: METRON-795: Install Metron REST with Ambari MPa...

2017-04-11 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/500
  
"confident this PR", not "confident that".  In retrospect, it sounds like 
I'm referring to your last comment, not just in general.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #500: METRON-795: Install Metron REST with Ambari MPa...

2017-04-11 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/500
  
@merrimanr I'm pretty confident that is going to collide unpleasantly with 
https://github.com/apache/incubator-metron/pull/518 (The MPack should function 
in a kerberized cluster).

I haven't thought through the details, but do you think it's potentially 
sufficient to make this component optional (cardinality 0+, rather than 1)?  
It's an extra step for the user, but it's only selecting it on the component 
list.  At that point, we say the REST API + UI is only supported on 
non-kerberized clusters (which is true anyway), and we (hopefully) don't break 
Kerberizing the core components.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #518: METRON-799: The MPack should function in a kerb...

2017-04-11 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/518
  
Full dev is able to successfully startup, be Kerberized via Ambari without 
errors (including Storm service check), and have new data flow through.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #518: METRON-799: The MPack should function in a kerb...

2017-04-11 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/518
  
Still need to do some additional testing (I'm spiin, but this is updated to 
pull the topology.auto-credentials down to the topology level.

This should let Storm actually run service check (because AutoTGT isn't 
initializing).  At the same time, Metron should be able to set these 
appropriately and run them.   This involves passing the config down to the 
various files (the .properties files that feed Flux) and updating integration 
tests slightly.

Oddly enough, the use of the tickets in the client_jaas file caused issues, 
so it's now using the ticket cache and running a kinit before acting on 
topologies.  I strongly dislike having the kinits, but there's no obvious 
reason for the difference in behavior between ticket cache and keytab.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #518: METRON-799: The MPack should function in a kerb...

2017-04-07 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/518
  
So coming back to the AutoTGT discussion and making it easier to see.  
There's basically two approaches we can takes that (should) work.

1. The current use of AutoTGT.  This requires setting up .storm/storm.yaml 
for the Storm nodes, and users will have to do the same.
2. The AutoHDFS / AutoHBase solution mentioned by @dlyle65535.  This 
requires symlinking some jars and configs to make them available to Storm.  
Should only be necessary for Nimbuses, but it does mean that HDFS/HBase 
upgrades can make the symlinks stale.

Either solution requires Ambari acting on a different node (potentially) 
than the one running the scripts.  I don't know if Ambari has any resources for 
handling that sort of things.  It could potentially be a command over ssh to 
another node, but presumably that requires passwordless ssh setup or someone to 
manually create the symlinks (which may be acceptable for this pass).

I'm fairly strongly inclined towards the second one, primarily because it 
requires less effort on the users part.  Ambari work is fairly similar either 
way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #518: METRON-799: The MPack should function in...

2017-04-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/518#discussion_r110413918
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/metron_security.py
 ---
@@ -0,0 +1,74 @@
+"""
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+import os.path
+from resource_management.core.source import Template
+from resource_management.core.resources.system import Directory, File
+from resource_management.core import global_lock
+from resource_management.core.logger import Logger
+from resource_management.core.resources.system import Execute
+from resource_management.libraries.functions import format as ambari_format
+
+
+# Convenience function for ensuring home dirs are setup consistently.
+def storm_security_setup(params):
+if params.security_enabled:
+# I don't think there's an Ambari way to get a user's local home 
dir , so have Python perform tilde expansion.
+# Ambari's Directory doesn't do tilde expansion.
+metron_storm_dir_tilde = '~' + params.metron_user + '/.storm'
+metron_storm_dir = os.path.expanduser(metron_storm_dir_tilde)
+Directory(metron_storm_dir,
+  mode=0755,
+  owner=params.metron_user,
+  group=params.metron_group
+  )
+
+File(ambari_format('{client_jaas_path}'),
+ content=Template('client_jaas.conf.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
+File(metron_storm_dir + '/storm.yaml',
+ content=Template('storm.yaml.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
--- End diff --

Yeah, you're right, you should be able to change it via HA.  I just meant I 
can't go and change it to something in full dev.

Re: the AutoTGT stuff
I'm unsure if HA Storm would be an issue, but assuming Storm handles 
spinning up secure nimbuses correctly it shouldn't be.  New nimbuses are 
required to make sure they get all code of active topologies.  I would assume 
this includes getting any TGTs (although I haven't verified this) to ensure 
that active topologies can continue to run.  We'd need to test on an actual HA 
cluster.

I disagree with AutoHDFS/AutoHBase being less complex, because if I recall 
correctly (and I don't, please correct me), that solution required setting up 
symlinks on each Storm node.  I don't even know that we easily have that 
capability in Ambari as the Metron service. Even if we do work around that, I 
don't see AutoTGT as complicated, and the Storm docs mention "On a kerberos 
secure cluster they should be set by default to point to 
org.apache.storm.security.auth.kerberos.AutoTGT."  It seems reasonable to go 
with Storm's recommendation unless we have a compelling reason not to.

The custom storm.yaml only contains 3 configs that we need to run
1. Nimbus seeds (gathered from Storm configs themselves)
2. Jaas file
3. Thrift protocol (which admittedly doesn't grab correctly from Storm, but 
is essentially a constant for the purpose of Kerberos).

Of these, only the jaas file really matters for management, and all it does 
it define named stanzas and it's not anything particularly complicated.

Having said that, the Storm service check could make AutoTGT a lot less 
attractive if it's not easily workable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #518: METRON-799: The MPack should function in...

2017-04-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/518#discussion_r110411025
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/metron_security.py
 ---
@@ -0,0 +1,74 @@
+"""
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+import os.path
+from resource_management.core.source import Template
+from resource_management.core.resources.system import Directory, File
+from resource_management.core import global_lock
+from resource_management.core.logger import Logger
+from resource_management.core.resources.system import Execute
+from resource_management.libraries.functions import format as ambari_format
+
+
+# Convenience function for ensuring home dirs are setup consistently.
+def storm_security_setup(params):
+if params.security_enabled:
+# I don't think there's an Ambari way to get a user's local home 
dir , so have Python perform tilde expansion.
+# Ambari's Directory doesn't do tilde expansion.
+metron_storm_dir_tilde = '~' + params.metron_user + '/.storm'
+metron_storm_dir = os.path.expanduser(metron_storm_dir_tilde)
+Directory(metron_storm_dir,
+  mode=0755,
+  owner=params.metron_user,
+  group=params.metron_group
+  )
+
+File(ambari_format('{client_jaas_path}'),
+ content=Template('client_jaas.conf.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
+File(metron_storm_dir + '/storm.yaml',
+ content=Template('storm.yaml.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
--- End diff --

It's not possible to change it via Ambari, but it will populate the 
variable appropriately.  There's some translation magic happening somewhere, 
because Ambari lists the value of nimbus.seeds as 'node1', but it actually 
passes it down as a list (which is why the new commit doesn't surround it with 
square brackets). I'm guessing this magic is also why the thrift config doesn't 
get passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #518: METRON-799: The MPack should function in...

2017-04-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/518#discussion_r110405590
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/metron_security.py
 ---
@@ -0,0 +1,74 @@
+"""
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+import os.path
+from resource_management.core.source import Template
+from resource_management.core.resources.system import Directory, File
+from resource_management.core import global_lock
+from resource_management.core.logger import Logger
+from resource_management.core.resources.system import Execute
+from resource_management.libraries.functions import format as ambari_format
+
+
+# Convenience function for ensuring home dirs are setup consistently.
+def storm_security_setup(params):
+if params.security_enabled:
+# I don't think there's an Ambari way to get a user's local home 
dir , so have Python perform tilde expansion.
+# Ambari's Directory doesn't do tilde expansion.
+metron_storm_dir_tilde = '~' + params.metron_user + '/.storm'
+metron_storm_dir = os.path.expanduser(metron_storm_dir_tilde)
+Directory(metron_storm_dir,
+  mode=0755,
+  owner=params.metron_user,
+  group=params.metron_group
+  )
+
+File(ambari_format('{client_jaas_path}'),
+ content=Template('client_jaas.conf.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
+File(metron_storm_dir + '/storm.yaml',
+ content=Template('storm.yaml.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
--- End diff --

nimbus.seeds is updated to grab the actual value.  Oddly, 
storm.thrift.transport doesn't end up flowing correctly to the template.  I 
double checked all the spelling and so on, and it always ended up just being 
the variable, not the value.  I'm inclined to let that one go, since it's only 
even created in a Kerberized environment anyway and it's a constant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #518: METRON-799: The MPack should function in a kerb...

2017-04-07 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/518
  
@dlyle65535 I'll take a look at nimbus.admins.  I'm pretty confident it's 
running as Storm user though,
Partial output from Ambari:
```
Traceback (most recent call last):
 ...
user=params.storm_user
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #518: METRON-799: The MPack should function in...

2017-04-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/518#discussion_r110388704
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/metron_security.py
 ---
@@ -0,0 +1,74 @@
+"""
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+import os.path
+from resource_management.core.source import Template
+from resource_management.core.resources.system import Directory, File
+from resource_management.core import global_lock
+from resource_management.core.logger import Logger
+from resource_management.core.resources.system import Execute
+from resource_management.libraries.functions import format as ambari_format
+
+
+# Convenience function for ensuring home dirs are setup consistently.
+def storm_security_setup(params):
+if params.security_enabled:
+# I don't think there's an Ambari way to get a user's local home 
dir , so have Python perform tilde expansion.
+# Ambari's Directory doesn't do tilde expansion.
+metron_storm_dir_tilde = '~' + params.metron_user + '/.storm'
+metron_storm_dir = os.path.expanduser(metron_storm_dir_tilde)
+Directory(metron_storm_dir,
+  mode=0755,
+  owner=params.metron_user,
+  group=params.metron_group
+  )
+
+File(ambari_format('{client_jaas_path}'),
+ content=Template('client_jaas.conf.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
--- End diff --

My understanding, and correct me if I'm wrong, is that nimbus distributed 
and renews the TGT from Kerberos.  I'm honestly not sure how we'd get 
everything to line up otherwise if we have to distribute keytabs and jaas files 
and everything else out.

This does raise the question of what happens when the max renewal of a 
Kerberos ticket has passed. I don't know enough about the implementation to 
know what happens, and I'd be interested in thoughts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #518: METRON-799: The MPack should function in...

2017-04-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/518#discussion_r110385969
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/enrichment_commands.py
 ---
@@ -131,47 +153,35 @@ def init_geo(self):
 self.set_geo_configured()
 
 def init_kafka_topics(self):
-Logger.info('Creating Kafka topics')
-command_template = """{0}/kafka-topics.sh \
---zookeeper {1} \
---create \
---topic {2} \
---partitions {3} \
---replication-factor {4} \
---config retention.bytes={5}"""
-num_partitions = 1
-replication_factor = 1
-retention_gigabytes = int(self.__params.metron_topic_retention)
-retention_bytes = retention_gigabytes * 1024 * 1024 * 1024
-
-Logger.info("Creating topics for enrichment")
-topics = [self.__enrichment_topic]
-for topic in topics:
-Logger.info("Creating topic'{0}'".format(topic))
-Execute(command_template.format(self.__params.kafka_bin_dir,
-self.__params.zookeeper_quorum,
-topic,
-num_partitions,
-replication_factor,
-retention_bytes))
-
-Logger.info("Done creating Kafka topics")
+Logger.info('Creating Kafka topics for enrichment')
+# All errors go to indexing topics, so create it here if it's not 
already
+metron_service.init_kafka_topics(self.__params, 
[self.__enrichment_topic, self.__params.metron_error_topic])
 self.set_kafka_configured()
 
+def init_kafka_acls(self):
+Logger.info('Creating Kafka topics')
--- End diff --

Yes, yes I did.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #518: METRON-799: The MPack should function in...

2017-04-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/518#discussion_r110385783
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/indexing_commands.py
 ---
@@ -72,55 +93,46 @@ def remote_repo():
 raise ValueError("Unsupported repo type 
'{0}'".format(repo_type))
 
 def init_kafka_topics(self):
-Logger.info('Creating Kafka topics')
-command_template = """{0}/kafka-topics.sh \
---zookeeper {1} \
---create \
---topic {2} \
---partitions {3} \
---replication-factor {4} \
---config retention.bytes={5}"""
-num_partitions = 1
-replication_factor = 1
-retention_gigabytes = int(self.__params.metron_topic_retention)
-retention_bytes = retention_gigabytes * 1024 * 1024 * 1024
-Logger.info("Creating topics for indexing")
-
-Logger.info("Creating topic'{0}'".format(self.__indexing))
-Execute(command_template.format(self.__params.kafka_bin_dir,
-self.__params.zookeeper_quorum,
-self.__indexing,
-num_partitions,
-replication_factor,
-retention_bytes))
-Logger.info("Done creating Kafka topics")
+Logger.info('Creating Kafka topics for indexing')
+metron_service.init_kafka_topics(self.__params, [self.__indexing])
+
+def init_kafka_acls(self):
+Logger.info('Creating Kafka ACLs')
+# Indexed topic names matches the group
+metron_service.init_kafka_acls(self.__params, [self.__indexing], 
[self.__indexing])
 
 def init_hdfs_dir(self):
-Logger.info('Creating HDFS indexing directory')
+Logger.info('Setting up HDFS indexing directory')
+
+# Non Kerberized Metron runs under 'storm', requiring write under 
the 'hadoop' group.
+# Kerberized Metron runs under it's own user.
+ownership = 0755 if self.__params.security_enabled else 0775
+Logger.info('HDFS indexing directory ownership is: ' + 
str(ownership))
 
self.__params.HdfsResource(self.__params.metron_apps_indexed_hdfs_dir,
type="directory",
action="create_on_execute",
owner=self.__params.metron_user,
group=self.__params.hadoop_group,
--- End diff --

I decided not to mess with it.  If we have a preference on it not being 
owned by metron:hadoop, we can go ahead and do that, but I think we probably 
need a more thorough discussion of how we want all that owned and permissioned 
anyway.  Leaving it only readable seemed like a reasonable compromise for now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #518: METRON-799: The MPack should function in...

2017-04-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/518#discussion_r110385480
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/metron_security.py
 ---
@@ -0,0 +1,74 @@
+"""
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+import os.path
+from resource_management.core.source import Template
+from resource_management.core.resources.system import Directory, File
+from resource_management.core import global_lock
+from resource_management.core.logger import Logger
+from resource_management.core.resources.system import Execute
+from resource_management.libraries.functions import format as ambari_format
+
+
+# Convenience function for ensuring home dirs are setup consistently.
+def storm_security_setup(params):
+if params.security_enabled:
+# I don't think there's an Ambari way to get a user's local home 
dir , so have Python perform tilde expansion.
+# Ambari's Directory doesn't do tilde expansion.
+metron_storm_dir_tilde = '~' + params.metron_user + '/.storm'
+metron_storm_dir = os.path.expanduser(metron_storm_dir_tilde)
+Directory(metron_storm_dir,
+  mode=0755,
+  owner=params.metron_user,
+  group=params.metron_group
+  )
+
+File(ambari_format('{client_jaas_path}'),
+ content=Template('client_jaas.conf.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
--- End diff --

All the stuff in (or referenced in the case of client_jaas) ~metron/.storm 
should just need to be on the Metron node, because everything gets kicked off 
there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #518: METRON-799: The MPack should function in...

2017-04-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/518#discussion_r110385183
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/metron_security.py
 ---
@@ -0,0 +1,74 @@
+"""
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+
+import os.path
+from resource_management.core.source import Template
+from resource_management.core.resources.system import Directory, File
+from resource_management.core import global_lock
+from resource_management.core.logger import Logger
+from resource_management.core.resources.system import Execute
+from resource_management.libraries.functions import format as ambari_format
+
+
+# Convenience function for ensuring home dirs are setup consistently.
+def storm_security_setup(params):
+if params.security_enabled:
+# I don't think there's an Ambari way to get a user's local home 
dir , so have Python perform tilde expansion.
+# Ambari's Directory doesn't do tilde expansion.
+metron_storm_dir_tilde = '~' + params.metron_user + '/.storm'
+metron_storm_dir = os.path.expanduser(metron_storm_dir_tilde)
+Directory(metron_storm_dir,
+  mode=0755,
+  owner=params.metron_user,
+  group=params.metron_group
+  )
+
+File(ambari_format('{client_jaas_path}'),
+ content=Template('client_jaas.conf.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
+File(metron_storm_dir + '/storm.yaml',
+ content=Template('storm.yaml.j2'),
+ owner=params.metron_user,
+ group=params.metron_group,
+ mode=0755
+ )
+
--- End diff --

Just the properties in here.  Couple thoughts as you bring this up.

We probably want to have a ticket to make sure turning off Kerberos works 
correctly in the future.  The properties in the file (except nimbus.seeds) are 
essentially set properties.  We need our own client_jaas, and the 
storm.thrift.transport has to be there for some reason and that's pretty much 
constant on a secure cluster.  That should also be made a property that flows 
down from Storm.

nimbus.seeds is wrong and I need to carry that over from the actual Storm 
property. And I even made sure it was in params_linux and forgot to use it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #518: METRON-799: The MPack should function in...

2017-04-07 Thread justinleet
GitHub user justinleet opened a pull request:

https://github.com/apache/incubator-metron/pull/518

METRON-799: The MPack should function in a kerberized cluster

## Contributor Comments
Allows the Ambari Kerberos wizard to handle Metron setup.

Changes include:
- Creation of Keytabs
- Running everything as the Metron user, including Storm topology workers 
(on a Kerberized cluster).
- Setup for Metron user to actually be able to run (client_jaas setup, home 
Storm dir setup, etc.)
- Adjusting perms to 0755.  The exception is the HDFS output folder on a 
non-kerb cluster is left as 0775 because we don't have Storm running workers as 
metron user on.  When Kerberizing, the permissions will be restricted down to 
0755.
- Kafka ACLs
- HBase ACLs
- Refactored Topic creation to use a common function so I didn't have to 
edit the same thing 3 times.
- Automated updating of Storm configs (the AutoTGT and running workers as 
user)

There's still more testing I want to do, but this is definitely far enough 
along to submit a PR.

I've spun this up on full dev with the now modified Kerberos setup 
instructions, with the caveat that Ambari's Storm service check fails (it's 
harmless, as far as I can tell).  See below for more details.  As this does not 
touch the sensors, data will need to be pushed manually (same as the old 
instructions).  I've been able to push data from Kafka to Elasticsearch/HDFS.

### The Bad News
I would love insight on a problem, if anybody has some.  I haven't edited 
the docs to reflect this yet, in the hopes it'll be resolved.

Storm's service check will fail during (and after) Kerberization.  Metron 
can immediately be started perfectly fine.  Nothing is legit wrong, but this 
setup means that the storm user is unable to submit to the cluster (it doesn't 
have it's home directory setup with some configs).  Unfortunately, Ambari runs 
the service check as the storm user.

This can be worked around by creating ~storm/.storm/storm.yaml
```
nimbus.seeds : ['node1']
java.security.auth.login.config : 
'/usr/hdp/current/storm-supervisor/conf/storm_jaas.conf'
storm.thrift.transport : 
'org.apache.storm.security.auth.kerberos.KerberosSaslTransportPlugin
```
`java.security.auth.login.conf` can also be 
`/etc/storm/conf/storm_jaas.conf`, but the value above leads me to my next 
point.  All of these values already exist in storm.yaml.  The fact that they 
need to be specified again in the user's home is really strange. And it'll give 
an error that the TGT found is not renewable, not something you'd expected.

I'm unsure if there are restrictions on where Ambari chooses to run service 
check, so it's possible this would have to be setup on every node Storm lives 
on the cluster. I'm also unsure if we can actually have Ambari automate this if 
it turns out to be necessary, since we aren't the Storm service.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron (Incubating).  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root incubating-metron folder via:
  ```
  mvn -q clean integration-test install && build_utils/verify_licenses.sh 
  ```

- ~Have you written or updated unit tests and or integration tests to 
verify your changes?~
- ~If adding new dependencies to the code, are these dependencies licensed 
in a way that is compatible for inclusion under [ASF 

[GitHub] incubator-metron issue #506: METRON-818: Ambari elasticsearch.properties tem...

2017-04-03 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/506
  
Thanks for running it up and verifying it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #506: METRON-818: Ambari elasticsearch.properties tem...

2017-04-03 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/506
  
@JonZeolla did you have any issues spinning this up?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

2017-04-03 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/505#discussion_r109447598
  
--- Diff: 
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java
 ---
@@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
) throws Exception
   {
 BulkWriterResponse response = new BulkWriterResponse();
-SourceHandler handler = 
getSourceHandler(configurations.getIndex(sourceType));
+// Currently treating all the messages in a group for pass/failure.
 try {
-  handler.handle(messages);
-} catch(Exception e) {
+  // Messages can all result in different HDFS paths, because of 
Stellar Expressions, so we'll need to iterate through
+  for(JSONObject message : messages) {
+Map val = 
configurations.getSensorConfig(sourceType);
+String path = getHdfsPathExtension(
+sourceType,
+
(String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF,
 ""),
+message
+);
+SourceHandler handler = getSourceHandler(sourceType, path);
+handler.handle(message);
+  }
+} catch (Exception e) {
   response.addAllErrors(e, tuples);
 }
 
 response.addAllSuccesses(tuples);
 return response;
   }
 
+  public String getHdfsPathExtension(String sourceType, String 
stellarFunction, JSONObject message) {
+// If no function is provided, just use the sourceType directly
+if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
+  return sourceType;
+}
+
+StellarCompiler.Expression expression = 
sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> 
stellarProcessor.compile(stellarFunction));
+VariableResolver resolver = new MapVariableResolver(message);
--- End diff --

@cestella Made that change.  I did make the check `if(objResult != null && 
!(objResult instanceof String)`, to avoid having falling into the IAE when 
objResult is null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #506: METRON-818: Ambari elasticsearch.properties tem...

2017-04-03 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/506
  
Full dev spun up and ran fine, and I see items showing up in ES and HDFS


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

2017-04-03 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/505#discussion_r109438625
  
--- Diff: 
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java
 ---
@@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
) throws Exception
   {
 BulkWriterResponse response = new BulkWriterResponse();
-SourceHandler handler = 
getSourceHandler(configurations.getIndex(sourceType));
+// Currently treating all the messages in a group for pass/failure.
 try {
-  handler.handle(messages);
-} catch(Exception e) {
+  // Messages can all result in different HDFS paths, because of 
Stellar Expressions, so we'll need to iterate through
+  for(JSONObject message : messages) {
+Map val = 
configurations.getSensorConfig(sourceType);
+String path = getHdfsPathExtension(
+sourceType,
+
(String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF,
 ""),
+message
+);
+SourceHandler handler = getSourceHandler(sourceType, path);
+handler.handle(message);
+  }
+} catch (Exception e) {
   response.addAllErrors(e, tuples);
 }
 
 response.addAllSuccesses(tuples);
 return response;
   }
 
+  public String getHdfsPathExtension(String sourceType, String 
stellarFunction, JSONObject message) {
+// If no function is provided, just use the sourceType directly
+if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
+  return sourceType;
+}
+
+StellarCompiler.Expression expression = 
sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> 
stellarProcessor.compile(stellarFunction));
+VariableResolver resolver = new MapVariableResolver(message);
--- End diff --

@cestella I'm mostly concerned about the performance of function compile on 
every single message that comes through indexing.

If we keep the current approach, I would be interested in if there's a way 
to make things a little cleaner.

In retrospect, I think this should be an LRU cache, so that we don't keep 
around a given parse forever. Any thoughts on that, assuming performance would 
be enough of a concern to not just use your proposal?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

2017-04-03 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/505#discussion_r109432502
  
--- Diff: 
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java
 ---
@@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
) throws Exception
   {
 BulkWriterResponse response = new BulkWriterResponse();
-SourceHandler handler = 
getSourceHandler(configurations.getIndex(sourceType));
+// Currently treating all the messages in a group for pass/failure.
 try {
-  handler.handle(messages);
-} catch(Exception e) {
+  // Messages can all result in different HDFS paths, because of 
Stellar Expressions, so we'll need to iterate through
+  for(JSONObject message : messages) {
+Map val = 
configurations.getSensorConfig(sourceType);
+String path = getHdfsPathExtension(
+sourceType,
+
(String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF,
 ""),
+message
+);
+SourceHandler handler = getSourceHandler(sourceType, path);
+handler.handle(message);
+  }
+} catch (Exception e) {
   response.addAllErrors(e, tuples);
 }
 
 response.addAllSuccesses(tuples);
 return response;
   }
 
+  public String getHdfsPathExtension(String sourceType, String 
stellarFunction, JSONObject message) {
+// If no function is provided, just use the sourceType directly
+if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
+  return sourceType;
+}
+
+StellarCompiler.Expression expression = 
sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> 
stellarProcessor.compile(stellarFunction));
+VariableResolver resolver = new MapVariableResolver(message);
--- End diff --

Unfortunately, I don't think we can, unless we want to do more work to 
actually look up the function and validate. On top of it, things like MAP_GET 
essentially return Object anyway, so we'd still want to check if it's a String 
afterwards.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #506: METRON-818: Ambari elasticsearch.properties tem...

2017-04-03 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/506
  
@mmiklavc We probably should edit the Solr config.  That isn't carried 
through Ambari, so we don't have the same concern as here.  However, it does 
look like `storm.auto.credentials=[]` got added to solr.properties and I 
thought it wasn't necessary there.  Can we just drop that config and add the 
'topology.worker.childopts='?

Should I just go ahead and make that chance and add it to this PR?  And if 
I do, do we have any testing plan for Solr or are we just making best effort 
fixes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #506: METRON-818: Ambari elasticsearch.propert...

2017-04-03 Thread justinleet
GitHub user justinleet opened a pull request:

https://github.com/apache/incubator-metron/pull/506

METRON-818: Ambari elasticsearch.properties template is missing 
topology.worker.childopts

## Contributor Comments
Adding the empty config to the Ambari elasticsearch.properties template.

To test, spin up in a dev environment.  Indexing topology should produce 
results instead of an error in the logs now.

I'm still running this up in dev, but wanted to let people see what's going 
on. will update shortly.

As a workaround, just add this line to a running Ambari instance and 
restart indexing in Ambari to push the configs.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron (Incubating).  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root incubating-metron folder via:
  ```
  mvn -q clean integration-test install && build_utils/verify_licenses.sh 
  ```

- ~Have you written or updated unit tests and or integration tests to 
verify your changes?~
- ~If adding new dependencies to the code, are these dependencies licensed 
in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?~
- [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- ~Have you ensured that format looks appropriate for the output in which 
it is rendered by building and verifying the site-book? If not then run the 
following commands and the verify changes~
 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommened that [travis-ci](https://travis-ci.org) is set up for 
your personal repository such that your branches are built there before 
submitting a pull request.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/justinleet/incubator-metron ambari_config_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-metron/pull/506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #506


commit d1fd6cf59433747a3cca503df552fcd0f003f488
Author: justinjleet 
Date:   2017-04-03T13:34:50Z

Adding elasticsearch.properties empty config




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

2017-04-03 Thread justinleet
GitHub user justinleet opened a pull request:

https://github.com/apache/incubator-metron/pull/505

METRON-817: Customise output file path patterns for HDFS indexing

## Contributor Comments
Primarily this affects HdfsWriter by changing the output path from a set 
path (`/apps/metron/.../`), and allow it to be defined via a Stellar 
Function.  Specifically, the base path is still defined the same (The 
`/apps/metron/.../` portion), but the `` portion is dropped and can now 
be defined by a Stellar function.  By default, the original behavior of 
`` is used.  This is defined in the `.json` file as indicated 
in the new README.md for metron-writer.

### Notes
- This requires adding tracking things a bit more carefully (and if you're 
reviewing, please validate that it happens correctly).  When the outputFile is 
closed, we remove the sourceHandler from HdfsWriter's map.
  - I'm slightly concerned about the correctness of the implementation, but 
it seems necessary to ensure that we don't leave a bunch of SourceHandlers 
lying around as data changes (and we don't want an enormous number of output 
files being written to).
  - If there's a cleaner way to manage this, I'd love to hear it and can 
refactor pretty easily. It throws off the rotation count (because we kill the 
SourceHandler from the map itself), but I doubt we care about that since it 
really only shows up in the output filename anyway.
- This also adds an argument for max open files.  This is a flux level 
config. I defaulted this to 500.  500 was chosen because it was an arbitrary 
round number that wasn't enormous.
  - If someone has a default with any real reasoning behind it, I'll go 
ahead and change it.
- In HdfsWriter, we iterate through the messages, apply the Stellar 
function and then call the relevant handler. The entire group of message is 
treated as one single pass/fail (which is the same as the old behavior), rather 
than individually. The try/catch could potentially be moved into the for loop, 
but I don't think there's an explicit link between the message and the tuples 
that we can exploit to fail per message.  I don't think it needs to be 
addressed here, but I'm curious if there's thought on this.

### Testing
Unit tests are added to pretty much cover HdfsWriter, and this can be spun 
up in a dev environment.

To test in dev

- Spin up a dev environment
- Validate that the output matches the old format in HDFS (Nothing has an 
output function defined)
  ```
  [hdfs@node1 vagrant]$ hdfs dfs -ls /apps/metron/indexing/indexed/
  Found 3 items
  drwxrwxr-x   - storm hadoop  0 2017-04-03 13:11 
/apps/metron/indexing/indexed/bro
  drwxrwxr-x   - storm hadoop  0 2017-04-03 13:11 
/apps/metron/indexing/indexed/error
  drwxrwxr-x   - storm hadoop  0 2017-04-03 13:11 
/apps/metron/indexing/indexed/snort
  ```
- Edit the indexing config for Bro to include an outputPathFunction in the 
hdfs section, e.g. in `/usr/metron/0.3.1/config/zookeeper/indexing/bro.json`
  ```
  {
"hdfs" : {
  "index": "bro",
  "batchSize": 5,
  "enabled" : true,
  "outputPathFunction": "FORMAT('ipsrc-%s', ip_src_addr)"
},
"elasticsearch" : {
  "index": "bro",
  "batchSize": 5,
  "enabled" : true
},
"solr" : {
  "index": "bro",
  "batchSize": 5,
  "enabled" : true
}
  }
  ```
- Push the config configs to ZooKeeper: 
`/usr/metron/0.3.1/bin/zk_load_configs.sh -z node1:2181 -m PUSH -i 
/usr/metron/0.3.1/config/zookeeper/`
- Let some more data run through and check the output folders, e.g.
  ```
[hdfs@node1 vagrant]$ hdfs dfs -ls /apps/metron/indexing/indexed/
Found 5 items
drwxrwxr-x   - storm hadoop  0 2017-04-03 13:11 
/apps/metron/indexing/indexed/bro
drwxrwxr-x   - storm hadoop  0 2017-04-03 13:11 
/apps/metron/indexing/indexed/error
drwxrwxr-x   - storm hadoop  0 2017-04-03 13:14 
/apps/metron/indexing/indexed/ipsrc-192.168.138.158
drwxrwxr-x   - storm hadoop  0 2017-04-03 13:14 
/apps/metron/indexing/indexed/ipsrc-192.168.66.1
drwxrwxr-x   - storm hadoop  0 2017-04-03 13:11 
/apps/metron/indexing/indexed/snort
[hdfs@node1 vagrant]$ hdfs dfs -ls 
/apps/metron/indexing/indexed/ipsrc-192.168.138.158
Found 1 items
-rw-r--r--   1 storm hadoop 223182 2017-04-03 13:14 
/apps/metron/indexing/indexed/ipsrc-192.168.138.158/enrichment-null-0-0-1491225291377.json
  ```

## Pull Request Checklist

Thank you for submitting a contributio

[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-27 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
@cestella My +1 stands with the testing issues ironed out.  Thanks for 
looking into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #488: METRON-796: Mpack uses wrong group for owning H...

2017-03-26 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/488
  
https://issues.apache.org/jira/browse/METRON-349 is updated to be more 
complete and reflect the current state of things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #488: METRON-796: Mpack uses wrong group for owning H...

2017-03-26 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/488
  
@dlyle65535 Failure mode is that HDFS writes from Storm fail.  The 
directories are owned by metron:metron with 775.  Storm isn't in the metron 
group, so it fails to write.  The perms exception is thrown in the bolt and no 
output file is created.  Writes to ES work as expected.  Validating this is 
pretty easy, just run up full dev and see if the perms error shows up and if 
HDFS gets any files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #488: METRON-796: Mpack uses wrong group for owning H...

2017-03-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/488
  
@mattf-horton Permissions on the other items are generally 775, so they can 
be read as needed (and should be scaled back once we have everything lined up 
with the user as @simonellistonball mentions). I just wanted to touch things as 
little as possible to get them back in a working state.

We should either expand out or replace METRON-349 (including closing off 
permissions) as the actual solution to the problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #488: METRON-796: Mpack uses wrong group for o...

2017-03-24 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/488#discussion_r107965896
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/params/params_linux.py
 ---
@@ -39,7 +39,7 @@
 tmp_dir = Script.get_tmp_dir()
 
 hostname = config['hostname']
-metron_group = config['configurations']['cluster-env']['metron_group']
+hadoop_group = config['configurations']['cluster-env']['user_group']
--- End diff --

Updated with a comment to clarify things a bit.  Let me know if you think 
there's anything else we want to add or clarify.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #488: METRON-796: Mpack uses wrong group for owning H...

2017-03-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/488
  
For anybody looking, Matt's review comment is still relevant to discussion, 
but unfortunately hidden by GitHub thinking it's outdated after the last commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #488: METRON-796: Mpack uses wrong group for o...

2017-03-24 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/488#discussion_r107961358
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/params/params_linux.py
 ---
@@ -39,7 +39,7 @@
 tmp_dir = Script.get_tmp_dir()
 
 hostname = config['hostname']
-metron_group = config['configurations']['cluster-env']['metron_group']
+hadoop_group = config['configurations']['cluster-env']['user_group']
--- End diff --

I'll go ahead and move the config.

On the group issue, that is the group named 'hadoop'.  The cluster level 
config is named 'user_group', I have absolutely no idea why. I only called it 
'hadoop_group' here, so it was more obvious it shouldn't be killed in the 
future.  If there are objections to calling it 'hadoop_group', I could also 
carry it through as user_group and add a comment about the meaning in the 
params file.

For example, in HDP stack 
https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/stacks/HDP/2.0.6/configuration/cluster-env.xml#L158
```
user_group
Hadoop Group
hadoop
GROUP
Hadoop user group.
```

This declaration carried through a couple other stack definitions that I 
looked at.

The use of this group also seems fairly common, e.g. in 
https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs.py#L60
```
  if "hadoop-policy" in params.config['configurations']:
XmlConfig("hadoop-policy.xml",
  conf_dir=params.hadoop_conf_dir,
  
configurations=params.config['configurations']['hadoop-policy'],
  
configuration_attributes=params.config['configuration_attributes']['hadoop-policy'],
  owner=params.hdfs_user,
  group=params.user_group
)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
@cestella Just noticed Travis after I commented.  I'm moderately surprised 
that the most recent PR would break it, do you know what the issue is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
+1, was able to follow Mike's instructions, with a couple caveats.

- Group authorization command was missing
```
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer 
kafka.security.auth.SimpleAclAuthorizer --authorizer-properties 
zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster 
--allow-principal User:justin --group jsonMap_parser
```
- Topic authorization command on the enrichments topic side was missing.
```
 /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer 
kafka.security.auth.SimpleAclAuthorizer --authorizer-properties 
zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster 
--allow-principal User:justin --topic enrichments
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-24 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107913632
  
--- Diff: 
metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/topology/ParserTopologyBuilder.java
 ---
@@ -106,19 +105,22 @@ public static TopologyBuilder build(String 
zookeeperUrl,
   /**
* Create a spout that consumes tuples from a Kafka topic.
*
-   * @param zookeeperUrlZookeeper URL
+   * @param zkQuorum Zookeeper URL
* @param sensorType  Type of sensor
-   * @param offset  Kafka topic offset where the topology 
will start; BEGINNING, END, WHERE_I_LEFT_OFF
-   * @param kafkaSpoutConfigOptions Configuration options for the kafka 
spout
+   * @param kafkaConfigOptional Configuration options for the kafka 
spout
* @param parserConfigConfiguration for the parser
* @return
*/
-  private static KafkaSpout createKafkaSpout(String zookeeperUrl, String 
sensorType, SpoutConfig.Offset offset, EnumMap 
kafkaSpoutConfigOptions, SensorParserConfig parserConfig) {
-
+  private static StormKafkaSpout createKafkaSpout(String zkQuorum, String 
sensorType, Optional> kafkaConfigOptional, 
SensorParserConfig parserConfig) {
--- End diff --

StormKafkaSpout's return type here will actually be StormKafkaSpout, right?  Can we make that explicit, rather than untyped (and also drop 
the Object.class from the creates assuming the other typing change)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-24 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107912514
  
--- Diff: 
metron-platform/metron-storm-kafka/src/main/java/org/apache/metron/storm/kafka/flux/SimpleStormKafkaBuilder.java
 ---
@@ -0,0 +1,234 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.metron.storm.kafka.flux;
+
+import com.google.common.base.Joiner;
+import org.apache.kafka.clients.consumer.Consumer;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.common.serialization.ByteArrayDeserializer;
+import org.apache.metron.common.utils.KafkaUtils;
+import org.apache.storm.kafka.spout.*;
+import org.apache.storm.spout.SpoutOutputCollector;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.topology.OutputFieldsGetter;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Function;
+
+/**
+ * This is a convenience layer on top of the KafkaSpoutConfig.Builder 
available in storm-kafka-client.
+ * The justification for this class is two-fold.  First, there are a lot 
of moving parts and a simplified
+ * approach to constructing spouts is useful.  Secondly, and perhaps more 
importantly, the Builder pattern
+ * is decidedly unfriendly to use inside of Flux.  Finally, we can make 
things a bit more friendly by only requiring
+ * zookeeper and automatically figuring out the brokers for the bootstrap 
server.
+ *
+ * @param  The kafka key type
+ * @param  The kafka value type
+ */
+public class SimpleStormKafkaBuilder extends 
KafkaSpoutConfig.Builder {
+  final static String STREAM = "default";
+
+  /**
+   * The fields exposed by the kafka consumer.  These will show up in the 
Storm tuple.
+   */
+  public enum FieldsConfiguration {
+KEY("key", record -> record.key()),
+VALUE("value", record -> record.value()),
+PARTITION("partition", record -> record.partition()),
+TOPIC("topic", record -> record.topic())
+;
+String fieldName;
+Function recordExtractor;
+
+FieldsConfiguration(String fieldName, Function 
recordExtractor) {
+  this.recordExtractor = recordExtractor;
+  this.fieldName = fieldName;
+}
+
+/**
+ * Return a list of the enums
+ * @param configs
+ * @return
+ */
+public static List toList(String... configs) {
+  List ret = new ArrayList<>();
+  for(String config : configs) {
+ret.add(FieldsConfiguration.valueOf(config.toUpperCase()));
+  }
+  return ret;
+}
+
+/**
+ * Return a list of the enums from their string representation.
+ * @param configs
+ * @return
+ */
+public static List toList(List configs) {
+  List ret = new ArrayList<>();
+  for(String config : configs) {
+ret.add(FieldsConfiguration.valueOf(config.toUpperCase()));
+  }
+  return ret;
+}
+
+/**
+ * Construct a Fields object from an iterable of enums.  These fields 
are the fields
+ * exposed in the Storm tuple emitted from the spout.
+ * @param configs
+ * @return
+ */
+public static Fields getFields(Iterable configs) {
+  List fields = new ArrayList<>();
+  for(FieldsConfiguration config : configs) {
+fields.add(config.fieldName);
+  }
+  return new Fields(fields);
+}
+  }
+
+  /**
+   * Build a tuple given the fields and the topic.  We want to use our 
FieldsConfiguration enum
+   * to define what this tuple looks like.
+   * @param  T

[GitHub] incubator-metron issue #488: METRON-796: Mpack uses wrong group for owning H...

2017-03-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/488
  
I agree the topologies should run as the metron user, but this is just to 
get things back in a working state again (and it already used to be this way, 
so this isn't opening things up more than it was a couple weeks ago).

I actually thought there was a separate Jira for running as the Metron 
user, but the one I was thinking of is 
https://issues.apache.org/jira/browse/METRON-349.  The ticket should really be 
to consolidate everything under the metron user with appropriate ownership.  I 
don't have a preference for updating that ticket or closing it as a new one 
(and I'm not sure which the community would prefer).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #488: METRON-796: Mpack uses wrong group for o...

2017-03-24 Thread justinleet
GitHub user justinleet opened a pull request:

https://github.com/apache/incubator-metron/pull/488

METRON-796: Mpack uses wrong group for owning HDFS directories

## Contributor Comments
Reverts the group owner of a couple HDFS directories to be the hadoop 
group, rather than the metron group (which is just metron).  Right now, the 
topologies run as the storm user (which belongs to the hadoop group), and 
therefore didn't have permission to write to HDFS (including in quick and full 
dev).  This sets HDFS ownership to metron:hadoop, which lets it be handled 
appropriately.

Other items, such as configs and installation files, were just left as the 
metron group.

To test, just run up a dev environment and ensure files are being written 
and ownership makes sense (/apps/metron/indexing/indexed is metron:hadoop with 
755 perms).  The individual sensors will be owned by storm:hadoop (proving that 
writes work).

For example:
```
[vagrant@node1 ~]$ hdfs dfs -ls /apps/metron/indexing
Found 1 items
drwxrwxr-x   - metron hadoop  0 2017-03-24 12:57 
/apps/metron/indexing/indexed
[vagrant@node1 ~]$ hdfs dfs -ls /apps/metron/indexing/indexed
Found 3 items
drwxrwxr-x   - storm hadoop  0 2017-03-24 13:01 
/apps/metron/indexing/indexed/bro
drwxrwxr-x   - storm hadoop  0 2017-03-24 13:01 
/apps/metron/indexing/indexed/error
drwxrwxr-x   - storm hadoop  0 2017-03-24 13:01 
/apps/metron/indexing/indexed/snort
[vagrant@node1 ~]$ hdfs dfs -ls /apps/metron/indexing/indexed/bro
Found 1 items
-rw-r--r--   1 storm hadoop 211393 2017-03-24 13:01 
/apps/metron/indexing/indexed/bro/enrichment-null-0-0-1490360489968.json
```

As a note, metron_group existed twice in params_linux.py, so only the first 
instance is changed to hadoop_group and pulled appropriately.  The second is 
left as-is.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron (Incubating).  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root incubating-metron folder via:
  ```
  mvn -q clean integration-test install && build_utils/verify_licenses.sh 
  ```

- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommened that [travis-ci](https://travis-ci.org) is set up for 
your personal repository such that your branches are built there before 
submitting a pull request.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/justinleet/incubator-metron METRON-796

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-metron/pull/488.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #488


commit ec7d070524334603e1712b4649e5600ded450284
Author: justinjleet 
Date:   2017-03-24T00:55:23Z

Updating group perms




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-23 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107683392
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/HDFSWriterCallback.java
 ---
@@ -96,16 +109,18 @@ public HDFSWriterCallback withConfig(HDFSWriterConfig 
config) {
 this.config = config;
 return this;
 }
+
 @Override
 public List apply(List tuple, EmitContext context) {
-
-List keyValue = (List) tuple.get(0);
-LongWritable ts = (LongWritable) keyValue.get(0);
-BytesWritable rawPacket = (BytesWritable)keyValue.get(1);
+byte[] key = (byte[]) tuple.get(0);
+byte[] value = (byte[]) tuple.get(1);
+if(!config.getDeserializer().deserializeKeyValue(key, value, 
KeyValue.key.get(), KeyValue.value.get())) {
+LOG.debug("Dropping malformed packet...");
--- End diff --

I'm good with that. I had mostly discarded the worry about size because 
this is in debugging statements anyway and typically with storm you're setting 
reasonable timeouts on logging levels anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #487: METRON-792: Quick Dev should remove/replace RPM...

2017-03-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/487
  
I'm +1 by inspection


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
@cestella I'm good with keeping the extension points especially after the 
points you made. I think the TODOs are valuable, I just wanted to know the 
thought behind potentially building it out.

Given the API instability, unfortunately it seems like our dependencies 
aren't going to provide that insulation layer.  I'd rather have that be 
provided in a stable manner upstream from us, but that's not something we have 
any control over.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-22 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107438798
  
--- Diff: 
metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/integration/components/ParserTopologyComponent.java
 ---
@@ -97,6 +99,19 @@ public void start() throws UnableToStartException {
   public void stop() {
 if(stormCluster != null) {
   stormCluster.shutdown();
+  if(new File("logs/workers-artifacts").exists()) {
+Path rootPath = Paths.get("logs");
+Path destPath = Paths.get("target/logs");
+try {
+  Files.move(rootPath, destPath);
+  Files.walk(destPath)
--- End diff --

Same deal with FileUtils.deleteDirectory() here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-22 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107522830
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/HDFSWriterCallback.java
 ---
@@ -96,16 +109,18 @@ public HDFSWriterCallback withConfig(HDFSWriterConfig 
config) {
 this.config = config;
 return this;
 }
+
 @Override
 public List apply(List tuple, EmitContext context) {
-
-List keyValue = (List) tuple.get(0);
-LongWritable ts = (LongWritable) keyValue.get(0);
-BytesWritable rawPacket = (BytesWritable)keyValue.get(1);
+byte[] key = (byte[]) tuple.get(0);
+byte[] value = (byte[]) tuple.get(1);
+if(!config.getDeserializer().deserializeKeyValue(key, value, 
KeyValue.key.get(), KeyValue.value.get())) {
+LOG.debug("Dropping malformed packet...");
--- End diff --

Is it reasonable to include the key and value we're having issues with in 
the debug statement?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-22 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107524059
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/deserializer/Deserializers.java
 ---
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.metron.spout.pcap.deserializer;
+
+import org.apache.metron.common.utils.timestamp.TimestampConverters;
+import org.apache.metron.common.utils.timestamp.TimestampConverter;
+
+import java.util.function.Function;
+
+/**
+ * Deserializers take the raw bytes from kafka key and value and construct 
the timestamp and raw bytes for PCAP.
+ */
+public enum Deserializers {
+  /**
+   * Extract the timestamp from the key and the raw packet 
(global-headerless) from the value
+   */
+   FROM_KEY( converter -> new FromKeyDeserializer(converter))
+  /**
+   * Ignore the key and pull the timestamp directly from the packet 
itself.  Also, assume that the packet isn't global-headerless.
+   */
+  ,FROM_PACKET(converter -> new FromPacketDeserializer());
+  ;
+  Function creator;
+  Deserializers(Function creator)
+  {
+this.creator = creator;
+  }
+
+  public static KeyValueDeserializer create(String scheme, 
TimestampConverter converter) {
+try {
+  Deserializers ts = Deserializers.valueOf(scheme.toUpperCase());
+  return ts.creator.apply(converter);
+}
+catch(IllegalArgumentException iae) {
+  return Deserializers.FROM_KEY.creator.apply(converter);
+}
+  }
+
+  public static KeyValueDeserializer create(String scheme, String 
converter) {
+return create(scheme, 
TimestampConverters.valueOf(converter.toUpperCase()));
--- End diff --

Shouldn't this be a call to TimestampConverters.getConverter()?

And if we're uppercasing things, shouldn't it be in the TimestampConverter 
(given that it's meant to match an Enum value)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-22 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107521465
  
--- Diff: 
metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/topology/ParserTopologyBuilder.java
 ---
@@ -106,19 +105,26 @@ public static TopologyBuilder build(String 
zookeeperUrl,
   /**
* Create a spout that consumes tuples from a Kafka topic.
*
-   * @param zookeeperUrlZookeeper URL
+   * @param zkQuorum Zookeeper URL
* @param sensorType  Type of sensor
-   * @param offset  Kafka topic offset where the topology 
will start; BEGINNING, END, WHERE_I_LEFT_OFF
-   * @param kafkaSpoutConfigOptions Configuration options for the kafka 
spout
+   * @param kafkaConfigOptional Configuration options for the kafka 
spout
* @param parserConfigConfiguration for the parser
* @return
*/
-  private static KafkaSpout createKafkaSpout(String zookeeperUrl, String 
sensorType, SpoutConfig.Offset offset, EnumMap 
kafkaSpoutConfigOptions, SensorParserConfig parserConfig) {
-
+  private static StormKafkaSpout createKafkaSpout(String zkQuorum, String 
sensorType, Optional> kafkaConfigOptional, 
SensorParserConfig parserConfig) {
+Map kafkaSpoutConfigOptions = 
kafkaConfigOptional.orElse(new HashMap<>());
 String inputTopic = parserConfig.getSensorTopic() != null ? 
parserConfig.getSensorTopic() : sensorType;
-SpoutConfig spoutConfig = new SpoutConfig(new ZkHosts(zookeeperUrl), 
inputTopic, "", inputTopic).from(offset);
-SpoutConfigOptions.configure(spoutConfig, kafkaSpoutConfigOptions);
-return new KafkaSpout(spoutConfig);
+
if(!kafkaSpoutConfigOptions.containsKey(SpoutConfiguration.FIRST_POLL_OFFSET_STRATEGY.key))
 {
--- End diff --

Can you make these putIfAbsent calls()?

e.g.
```

kafkaSpoutConfigOptions.putIfAbsent(SpoutConfiguration.FIRST_POLL_OFFSET_STRATEGY.key,

KafkaSpoutConfig.FirstPollOffsetStrategy.UNCOMMITTED_EARLIEST.toString());
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-22 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107534534
  
--- Diff: pom.xml ---
@@ -67,20 +67,44 @@
 
 1.0.1
 1.0.1
-0.10.0.1
+0.10.0
 2.7.1
 1.1.1
-1.8.0
 1.5.2
 
+1.8.0
 4.5
 3.7
 2.7.1
 3.3
-${base_storm_version}
+1.0.3
+
+
1.0.1.2.5.0.0-1245
--- End diff --

I'm sure you've already thought of this, but assuming we do go with this, 
please make sure this gets a JIRA associated with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-22 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107531858
  
--- Diff: 
metron-platform/metron-storm-kafka/src/main/java/org/apache/metron/storm/kafka/flux/SimpleStormKafkaBuilder.java
 ---
@@ -0,0 +1,234 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.metron.storm.kafka.flux;
+
+import com.google.common.base.Joiner;
+import org.apache.kafka.clients.consumer.Consumer;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.common.serialization.ByteArrayDeserializer;
+import org.apache.metron.common.utils.KafkaUtils;
+import org.apache.storm.kafka.spout.*;
+import org.apache.storm.spout.SpoutOutputCollector;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.topology.OutputFieldsGetter;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Function;
+
+/**
+ * This is a convenience layer on top of the KafkaSpoutConfig.Builder 
available in storm-kafka-client.
+ * The justification for this class is two-fold.  First, there are a lot 
of moving parts and a simplified
+ * approach to constructing spouts is useful.  Secondly, and perhaps more 
importantly, the Builder pattern
+ * is decidedly unfriendly to use inside of Flux.  Finally, we can make 
things a bit more friendly by only requiring
+ * zookeeper and automatically figuring out the brokers for the bootstrap 
server.
+ *
+ * @param  The kafka key type
+ * @param  The kafka value type
+ */
+public class SimpleStormKafkaBuilder extends 
KafkaSpoutConfig.Builder {
+  final static String STREAM = "default";
+
+  /**
+   * The fields exposed by the kafka consumer.  These will show up in the 
Storm tuple.
+   */
+  public enum FieldsConfiguration {
+KEY("key", record -> record.key()),
+VALUE("value", record -> record.value()),
+PARTITION("partition", record -> record.partition()),
+TOPIC("topic", record -> record.topic())
+;
+String fieldName;
+Function recordExtractor;
+
+FieldsConfiguration(String fieldName, Function 
recordExtractor) {
+  this.recordExtractor = recordExtractor;
+  this.fieldName = fieldName;
+}
+
+/**
+ * Return a list of the enums
+ * @param configs
+ * @return
+ */
+public static List toList(String... configs) {
+  List ret = new ArrayList<>();
+  for(String config : configs) {
+ret.add(FieldsConfiguration.valueOf(config.toUpperCase()));
+  }
+  return ret;
+}
+
+/**
+ * Return a list of the enums from their string representation.
+ * @param configs
+ * @return
+ */
+public static List toList(List configs) {
+  List ret = new ArrayList<>();
+  for(String config : configs) {
+ret.add(FieldsConfiguration.valueOf(config.toUpperCase()));
+  }
+  return ret;
+}
+
+/**
+ * Construct a Fields object from an iterable of enums.  These fields 
are the fields
+ * exposed in the Storm tuple emitted from the spout.
+ * @param configs
+ * @return
+ */
+public static Fields getFields(Iterable configs) {
+  List fields = new ArrayList<>();
+  for(FieldsConfiguration config : configs) {
+fields.add(config.fieldName);
+  }
+  return new Fields(fields);
+}
+  }
+
+  /**
+   * Build a tuple given the fields and the topic.  We want to use our 
FieldsConfiguration enum
+   * to define what this tuple looks like.
+   * @param  T

[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-22 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107533535
  
--- Diff: 
metron-platform/metron-storm-kafka/src/main/java/org/apache/metron/storm/kafka/flux/SimpleStormKafkaBuilder.java
 ---
@@ -0,0 +1,234 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.metron.storm.kafka.flux;
+
+import com.google.common.base.Joiner;
+import org.apache.kafka.clients.consumer.Consumer;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.common.serialization.ByteArrayDeserializer;
+import org.apache.metron.common.utils.KafkaUtils;
+import org.apache.storm.kafka.spout.*;
+import org.apache.storm.spout.SpoutOutputCollector;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.topology.OutputFieldsGetter;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Function;
+
+/**
+ * This is a convenience layer on top of the KafkaSpoutConfig.Builder 
available in storm-kafka-client.
+ * The justification for this class is two-fold.  First, there are a lot 
of moving parts and a simplified
+ * approach to constructing spouts is useful.  Secondly, and perhaps more 
importantly, the Builder pattern
+ * is decidedly unfriendly to use inside of Flux.  Finally, we can make 
things a bit more friendly by only requiring
+ * zookeeper and automatically figuring out the brokers for the bootstrap 
server.
+ *
+ * @param  The kafka key type
+ * @param  The kafka value type
+ */
+public class SimpleStormKafkaBuilder extends 
KafkaSpoutConfig.Builder {
+  final static String STREAM = "default";
+
+  /**
+   * The fields exposed by the kafka consumer.  These will show up in the 
Storm tuple.
+   */
+  public enum FieldsConfiguration {
+KEY("key", record -> record.key()),
+VALUE("value", record -> record.value()),
+PARTITION("partition", record -> record.partition()),
+TOPIC("topic", record -> record.topic())
+;
+String fieldName;
+Function recordExtractor;
+
+FieldsConfiguration(String fieldName, Function 
recordExtractor) {
+  this.recordExtractor = recordExtractor;
+  this.fieldName = fieldName;
+}
+
+/**
+ * Return a list of the enums
+ * @param configs
+ * @return
+ */
+public static List toList(String... configs) {
+  List ret = new ArrayList<>();
+  for(String config : configs) {
+ret.add(FieldsConfiguration.valueOf(config.toUpperCase()));
+  }
+  return ret;
+}
+
+/**
+ * Return a list of the enums from their string representation.
+ * @param configs
+ * @return
+ */
+public static List toList(List configs) {
+  List ret = new ArrayList<>();
+  for(String config : configs) {
+ret.add(FieldsConfiguration.valueOf(config.toUpperCase()));
+  }
+  return ret;
+}
+
+/**
+ * Construct a Fields object from an iterable of enums.  These fields 
are the fields
+ * exposed in the Storm tuple emitted from the spout.
+ * @param configs
+ * @return
+ */
+public static Fields getFields(Iterable configs) {
+  List fields = new ArrayList<>();
+  for(FieldsConfiguration config : configs) {
+fields.add(config.fieldName);
+  }
+  return new Fields(fields);
+}
+  }
+
+  /**
+   * Build a tuple given the fields and the topic.  We want to use our 
FieldsConfiguration enum
+   * to define what this tuple looks like.
+   * @param  T

[GitHub] incubator-metron pull request #486: METRON-793: Migrate to storm-kafka-clien...

2017-03-22 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/486#discussion_r107436680
  
--- Diff: 
metron-platform/metron-integration-test/src/main/java/org/apache/metron/integration/components/FluxTopologyComponent.java
 ---
@@ -133,7 +138,25 @@ public void start() throws UnableToStartException {
   @Override
   public void stop() {
 if (stormCluster != null) {
-  stormCluster.shutdown();
+  try {
+stormCluster.shutdown();
+if(new File("logs/workers-artifacts").exists()) {
+  Path rootPath = Paths.get("logs");
+  Path destPath = Paths.get("target/logs");
+  try {
+Files.move(rootPath, destPath);
+Files.walk(destPath)
--- End diff --

Could this just be a FileUtils.deleteDirectory(destPath)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #482: METRON-791: Add links to website and dow...

2017-03-18 Thread justinleet
GitHub user justinleet opened a pull request:

https://github.com/apache/incubator-metron/pull/482

METRON-791: Add links to website and downloads to top level POM

## Contributor Comments
Per the release thread discssion, I'm quick throwing a link to the main 
page in the first section, and a link to the releases in the "Obtaining Metron" 
section of the top level POM.  Because it's just a README change, it can be 
validated quickly with Github's "View" button on the changes.

If we want to change verbiage, or link something slightly different, let me 
know and I'll quick update the PR.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron (Incubating).  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [N/A] Have you included steps to reproduce the behavior or problem that 
is being changed or addressed?
- [N/A] Have you included steps or a guide to how the change may be 
verified and tested manually?
- [N/A] Have you ensured that the full suite of tests and checks have been 
executed in the root incubating-metron folder via:
  ```
  mvn -q clean integration-test install && build_utils/verify_licenses.sh 
  ```

- [N/A] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [N/A] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [N/A] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  bin/generate-md.sh
  mvn site:site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommened that [travis-ci](https://travis-ci.org) is set up for 
your personal repository such that your branches are built there before 
submitting a pull request.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/justinleet/incubator-metron METRON-791

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-metron/pull/482.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #482


commit 2df1dee9dce130d5d3d654fb400db8fcc0903d86
Author: justinjleet 
Date:   2017-03-18T19:54:11Z

adding links




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #478: METRON-767: Clean up license

2017-03-16 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/478
  
I'm +1 on this by inspection (pending Travis).  Thanks for taking care of 
the mentor feedback on this so quickly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #478: METRON-767: Clean up license from METRON-622

2017-03-16 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/478
  
Actually, looking into this, it's my fault. I split the MIT license stuff 
with the geo share alike.  It should be moved up.

@cestella You want to quick fix that and put up a new PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #459: METRON-726: Clean up mvn site generation

2017-03-13 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/459
  
@mattf-horton I think this is a pretty good approach and gives us a lot of 
value especially as we build up more releases.  And there is a 
`-Dmaven.site.skip=true` flag for maven that can be added to skip all the site 
stuff.

For right now, I've seen instability in running tests in Jenkins on this 
branch after pulling in master.  I was hoping to resolve it more quickly, but I 
might end up having to roll back bits and pieces of the change/integration 
until I can narrow down what went wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #436: METRON-671: Refactor existing Ansible deploymen...

2017-03-10 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/436
  
I'm +1.  I was just waiting for the EC2 component, but was able to get 
quick-dev, etc. spun up without issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #436: METRON-671: Refactor existing Ansible deploymen...

2017-03-09 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/436
  
@dlyle65535 METRON-745 is in (as I'm sure you can tell from the conflict 
list).  I already incorporated the Kibana map changes, so you should just be 
able to accept master's version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #459: METRON-726: Clean up mvn site generation

2017-03-08 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/459
  
Updated to (hopefully) not blow up on Travis. Surefire needs 
jacoco:prepare-agent to resolve the @{argline}.  It's only done in Travis, so 
it seems reasonable to just call it directly.

Also makes the surefire version a global and sets it up throughout 
(including in some unversioned spots).  Makes sure @{argLine} is in the 
appropriate  tags. It might also be appropriate in reporting tags, but 
our management of surefire is pretty variable across the board.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #459: METRON-726: Clean up mvn site generation

2017-03-08 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/459
  
This PR will need another fix, so I'll update when that's good to go


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #459: METRON-726: Clean up mvn site generation

2017-03-08 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/459
  
@mmiklavc @dlyle65535 I just updated the PR to work with metron-interface 
(since it came in after this PR).  I also merged in master to get some 
miscellaneous fixes (including in the site-book).

I also added site-book as a component module, so that it gets built at the 
same time and gets pulled into the site (and can be clicked into when spun up). 
 We may or may not want to leave that, depending on if we want that in the base 
mvn clean install.  Let me know if there's a preference.  It's a one line 
change to revert that.

Finally, I added Javadoc report generation to the top level POM, so it's 
integrated directly with the site now (feel free to spin it up again and click 
into Javadocs!)

@mattf-horton This PR now also includes a fatal Javadoc fix.  I think at 
this point, it's mostly the integration (unless something else breaks in the 
meantime).  I think at this point, all the reporting works and gets done in one 
shot.

I haven't created the ticket for doing something nice with the outputs of 
this, just because I didn't know if there would be discussion. I'll go ahead 
and create this, since it seems like people are on board.  Matt, if you have 
any insight into what the appropriate integrations are (e.g. what other 
projects do), I'd love to add that to the ticket to give whoever picks it up a 
little extra guidance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #475: METRON-745: Create Error Dashboards

2017-03-07 Thread justinleet
GitHub user justinleet opened a pull request:

https://github.com/apache/incubator-metron/pull/475

METRON-745: Create Error Dashboards

## Summary
Following Ryan's work in 
https://github.com/apache/incubator-metron/pull/453, we have the opportunity to 
present errors from our topologies.

It's nothing too complicated, essentially just some high level overviews of 
the various error fields, along with a pane for viewing the actual errors along 
with all their fields.  Note that they include both raw and unique message 
counts (via the hash fields) in most things.

This also corrects the error_index.template files.  These are supposed to 
match ErrorFields in Constants.java, but didn't.

I've attached some screenshots, and this can be spun up on quick dev and 
Ambari (both dashboard.p and the various kibana-index.json are updated).  Quick 
dev automatically passes some data through, so it's a good way to get this spun 
up with something interesting showing.

Feedback on what else would be useful and if we want to adjust anything 
would be great.  Keep in mind, we don't actually have a lot of fields to work 
with (because if everything was good, we wouldn't be here in the first place!). 
See error_index.template for the fields we have.

### Testing
Spun up in quick-dev and Ambari.  Quick-dev will automatically put data 
through

### Notes
* I'm really not convinced the 'hostname' visualizations are needed.  The 
field is there and useful, but given that it's populated with the Storm host 
that failed, it seems like it's probably useless most of the time.
* Kibana occasionally rearranges the order of the visualizations (usually 
swapping a couple of the charts). 
 If I recall correctly, that's a known Kibana bug that we're stuck with.
* Keep in mind the graph shifts by the viewing window.  So last 15 minutes 
vs last 7 days all updates accordingly.
* This includes a fix to maps mentioned in 
https://github.com/apache/incubator-metron/pull/436.  If that PR goes in before 
this one, this PR should be take it's own copy of the dashboards.  If this PR 
goes in first, that PR should accept this one's dashboards.

https://cloud.githubusercontent.com/assets/5077341/23672488/e5ae6c9a-033c-11e7-834d-caab26497f0e.png";>
https://cloud.githubusercontent.com/assets/5077341/23672512/f3ceec5a-033c-11e7-946f-f7d279a4f3b0.png";>
https://cloud.githubusercontent.com/assets/5077341/23672513/f58b317a-033c-11e7-8a21-b1a927971bba.png";>
https://cloud.githubusercontent.com/assets/5077341/23672514/f6b6ee90-033c-11e7-9630-0143f2f31fcd.png";>



The bottom pane extends further down, but we've all seen a table of data 
before.


Thank you for submitting a contribution to Apache Metron (Incubating).
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.
Please refer also to our [Build Verification 
guildlines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check
the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root incubating-metron folder via:

```
mvn -q clean integration-test install && build_utils/verify_licenses.sh 
```

- [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related chang

[GitHub] incubator-metron pull request #469: DO NOT MERGE METRON-745: Create Error Da...

2017-03-07 Thread justinleet
Github user justinleet closed the pull request at:

https://github.com/apache/incubator-metron/pull/469


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #469: DO NOT MERGE METRON-745: Create Error Dashboard...

2017-03-07 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/469
  
I'm going to just close this and open a new, much, much cleaner one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #474: METRON-758: HdfsServiceImplTest should sort fil...

2017-03-07 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/474
  
+1, by inspection


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #474: METRON-758: HdfsServiceImplTest should s...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/474#discussion_r104725400
  
--- Diff: 
metron-interface/metron-rest/src/test/java/org/apache/metron/rest/service/impl/HdfsServiceImplTest.java
 ---
@@ -65,6 +66,7 @@ public void listShouldListFiles() throws Exception {
 FileUtils.writeStringToFile(new File(testDir, "file2.txt"), 
"value2");
 
 List paths = hdfsService.list(new Path(testDir));
+Collections.sort(paths, String::compareTo);
--- End diff --

This is nitpicky, but why even specify String::compareTo? 
Collections.sort(paths) uses compareTo by default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #436: METRON-671: Refactor existing Ansible deploymen...

2017-03-07 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/436
  
@dlyle65535 @ottobackwards Can we link the various Ambari logs so they're 
visible on the local machine?  Anything that blows up in the UI should blow up 
in the logs, which means everything is searchable like it was before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #436: METRON-671: Refactor existing Ansible deploymen...

2017-03-07 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/436
  
@dlyle65535 Perfect, thanks.  I'm going to go ahead and just make that 
change and test in 745.  If this goes in first, 745 just takes its dashboard 
changes entirely.  In the less likely event that 745 goes in first, this PR 
just accepts the changes entirely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #459: METRON-726: Clean up mvn site generation

2017-03-07 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/459
  
Bumping this.  There was a dev list discussion and it spun out 
[METRON-746](https://issues.apache.org/jira/browse/METRON-746) and 
[METRON-747](https://issues.apache.org/jira/browse/METRON-747)

I don't think either of those are blockers to reviewing and pulling in this 
ticket.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104678359
  
--- Diff: metron-deployment/roles/kibana/README.md ---
@@ -1,35 +0,0 @@
-Kibana 4
-
-
-This role installs Kibana along with the default Metron Dashboard.
-
-### FAQ
-
- How do I change Metron's default dashboard?
--- End diff --

I'm inclined (possibly by my own self interest) to make it a follow on jira 
that gets resolved either before or after 745.  745 takes care of this file 
(which is actually why I thought of it).

I definitely agree on using the same file though, but I'm not sure off the 
top of my head how much refactoring happens there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104673955
  
--- Diff: metron-deployment/roles/kibana/README.md ---
@@ -1,35 +0,0 @@
-Kibana 4
-
-
-This role installs Kibana along with the default Metron Dashboard.
-
-### FAQ
-
- How do I change Metron's default dashboard?
--- End diff --

kibana-index.json still exists in the docker stuff. Given that the module 
is buried in the Ambari stuff, do we still need/want the instructions and an 
elasticdump done to update that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #436: METRON-671: Refactor existing Ansible deploymen...

2017-03-07 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/436
  
@dlyle65535 @nickwallen I'm not sure what the exact fix was with the 
dashboard.  Either approach to fixing will work for me, but I don't know what 
made the map not work to fix it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104672196
  
--- Diff: metron-deployment/extra_modules/ambari_service_state.py ---
@@ -0,0 +1,352 @@
+#!/usr/bin/python
+#
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+
+DOCUMENTATION = '''
+---
+module: ambari_service_state
+version_added: "2.1"
+author: Apache Metron (Incubating : 
https://github.com/apache/incubator-metron )
+short_description: Start/Stop/Change Service or Component State
+description:
+- Start/Stop/Change Service or Component State
+options:
+  host:
+description:
+  The hostname for the ambari web server
+  port:
+description:
+  The port for the ambari web server
+  username:
+description:
+  The username for the ambari web server
+  password:
+description:
+  The name of the cluster in web server
+required: yes
+  cluster_name:
+description:
+  The name of the cluster in ambari
+required: yes
+  service_name:
+description:
+  The name of the service to alter
+required: no
+  component_name:
+description:
+  The name of the component to alter
+required: no
+  component_host:
+description:
+  The host running the targeted component. Required when 
component_name is used.
+required: no
+  state:
+description:
+  The desired service/component state.
+  wait_for_complete:
+description:
+  Whether to wait for the request to complete before returning. 
Default is False.
+required: no
+  requirements: [ 'requests']
+'''
+
+EXAMPLES = '''
+# must use full relative path to any files in stored in 
roles/role_name/files/
+- name: Create a new ambari cluster
+ambari_cluster_state:
+  host: localhost
+  port: 8080
+  username: admin
+  password: admin
+  cluster_name: my_cluster
+  cluster_state: present
+  blueprint_var: roles/my_role/files/blueprint.yml
+  blueprint_name: hadoop
+  wait_for_complete: True
+- name: Start the ambari cluster
+  ambari_cluster_state:
+host: localhost
+port: 8080
+username: admin
+password: admin
+cluster_name: my_cluster
+cluster_state: started
+wait_for_complete: True
+- name: Stop the ambari cluster
+  ambari_cluster_state:
+host: localhost
+port: 8080
+username: admin
+password: admin
+cluster_name: my_cluster
+cluster_state: stopped
+wait_for_complete: True
+- name: Delete the ambari cluster
+  ambari_cluster_state:
+host: localhost
+port: 8080
+username: admin
+password: admin
+cluster_name: my_cluster
+cluster_state: absent
+'''
+
+RETURN = '''
+results:
+description: The content of the requests object returned from the 
RESTful call
+returned: success
+type: string
+'''
+
+__author__ = 'apachemetron'
+
+import json
+
+try:
+import requests
+except ImportError:
+REQUESTS_FOUND = False
+else:
+REQUESTS_FOUND = True
+
+
+def main():
+
+argument_spec = dict(
+host=dict(type='str', default=None, required=True),
+port=dict(type='int', default=None, required=True),
+username=dict(type='str', default=None, required=True),
+password=dict(type='str', default=None, required=True),
+cluster_name=dict(type='str', default=None, required=True),
+state=dict(type='str', default=None, required=True,
+   choices=['started

[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104671925
  
--- Diff: metron-deployment/roles/ambari_config/vars/single_node_vm.yml ---
@@ -80,10 +88,32 @@ configurations:
   - kafka-broker:
   log.dirs: '{{ kafka_log_dirs }}'
   delete.topic.enable: "true"
+  - metron-env:
+  parsers: "bro,snort"
+  - elastic-site:
+  index_number_of_shards: 1
+  index_number_of_replicas: 0
+  zen_discovery_ping_unicast_hosts: "{{ groups.search | join(',') }}"
+  gateway_recover_after_data_nodes: 1
+  network_host: "_lo_,_eth0_,_eth1_"
+  masters_also_are_datanodes: "1"
--- End diff --

I'm fine with whatever works.  It's ES configuration, so if it wants to 
accept "1", it can be my guest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104665126
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/ELASTICSEARCH/2.3.3/configuration/elastic-site.xml
 ---
@@ -27,6 +27,14 @@
 Cluster name identifies your cluster
 
 
+masters_also_are_datanodes
+"false"
--- End diff --

No, it's not that important.  Can you add to the description that it has to 
be in quotes for ES compatibility?  I can definitely see that causing confusion 
otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104659900
  
--- Diff: metron-deployment/roles/ambari_master/tasks/main.yml ---
@@ -38,6 +38,16 @@
   register: ambari_server_setup
   failed_when: ambari_server_setup.stderr
 
+- name: Copy MPack to Ambari Host
+  copy:
+src: "{{ playbook_dir 
}}/../packaging/ambari/metron-mpack/target/metron_mpack-0.3.1.0.tar.gz"
--- End diff --

Can we pull the mpack version out into a variable so there are less places 
to change it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104654194
  
--- Diff: metron-deployment/extra_modules/ambari_service_state.py ---
@@ -0,0 +1,352 @@
+#!/usr/bin/python
+#
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+
+DOCUMENTATION = '''
+---
+module: ambari_service_state
+version_added: "2.1"
+author: Apache Metron (Incubating : 
https://github.com/apache/incubator-metron )
+short_description: Start/Stop/Change Service or Component State
+description:
+- Start/Stop/Change Service or Component State
+options:
+  host:
+description:
+  The hostname for the ambari web server
+  port:
+description:
+  The port for the ambari web server
+  username:
+description:
+  The username for the ambari web server
+  password:
+description:
+  The name of the cluster in web server
+required: yes
+  cluster_name:
+description:
+  The name of the cluster in ambari
+required: yes
+  service_name:
+description:
+  The name of the service to alter
+required: no
+  component_name:
+description:
+  The name of the component to alter
+required: no
+  component_host:
+description:
+  The host running the targeted component. Required when 
component_name is used.
+required: no
+  state:
+description:
+  The desired service/component state.
+  wait_for_complete:
+description:
+  Whether to wait for the request to complete before returning. 
Default is False.
+required: no
+  requirements: [ 'requests']
+'''
+
+EXAMPLES = '''
+# must use full relative path to any files in stored in 
roles/role_name/files/
+- name: Create a new ambari cluster
+ambari_cluster_state:
+  host: localhost
+  port: 8080
+  username: admin
+  password: admin
+  cluster_name: my_cluster
+  cluster_state: present
+  blueprint_var: roles/my_role/files/blueprint.yml
+  blueprint_name: hadoop
+  wait_for_complete: True
+- name: Start the ambari cluster
+  ambari_cluster_state:
+host: localhost
+port: 8080
+username: admin
+password: admin
+cluster_name: my_cluster
+cluster_state: started
+wait_for_complete: True
+- name: Stop the ambari cluster
+  ambari_cluster_state:
+host: localhost
+port: 8080
+username: admin
+password: admin
+cluster_name: my_cluster
+cluster_state: stopped
+wait_for_complete: True
+- name: Delete the ambari cluster
+  ambari_cluster_state:
+host: localhost
+port: 8080
+username: admin
+password: admin
+cluster_name: my_cluster
+cluster_state: absent
+'''
+
+RETURN = '''
+results:
+description: The content of the requests object returned from the 
RESTful call
+returned: success
+type: string
+'''
+
+__author__ = 'apachemetron'
+
+import json
+
+try:
+import requests
+except ImportError:
+REQUESTS_FOUND = False
+else:
+REQUESTS_FOUND = True
+
+
+def main():
+
+argument_spec = dict(
+host=dict(type='str', default=None, required=True),
+port=dict(type='int', default=None, required=True),
+username=dict(type='str', default=None, required=True),
+password=dict(type='str', default=None, required=True),
+cluster_name=dict(type='str', default=None, required=True),
+state=dict(type='str', default=None, required=True,
+   choices=['started

[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104654898
  
--- Diff: metron-deployment/extra_modules/ambari_service_state.py ---
@@ -0,0 +1,352 @@
+#!/usr/bin/python
+#
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+
+DOCUMENTATION = '''
+---
+module: ambari_service_state
+version_added: "2.1"
+author: Apache Metron (Incubating : 
https://github.com/apache/incubator-metron )
+short_description: Start/Stop/Change Service or Component State
+description:
+- Start/Stop/Change Service or Component State
+options:
+  host:
+description:
+  The hostname for the ambari web server
+  port:
+description:
+  The port for the ambari web server
+  username:
+description:
+  The username for the ambari web server
+  password:
+description:
+  The name of the cluster in web server
+required: yes
+  cluster_name:
+description:
+  The name of the cluster in ambari
+required: yes
+  service_name:
+description:
+  The name of the service to alter
+required: no
+  component_name:
+description:
+  The name of the component to alter
+required: no
+  component_host:
+description:
+  The host running the targeted component. Required when 
component_name is used.
+required: no
+  state:
+description:
+  The desired service/component state.
+  wait_for_complete:
+description:
+  Whether to wait for the request to complete before returning. 
Default is False.
+required: no
+  requirements: [ 'requests']
+'''
+
+EXAMPLES = '''
+# must use full relative path to any files in stored in 
roles/role_name/files/
+- name: Create a new ambari cluster
+ambari_cluster_state:
+  host: localhost
+  port: 8080
+  username: admin
+  password: admin
+  cluster_name: my_cluster
+  cluster_state: present
+  blueprint_var: roles/my_role/files/blueprint.yml
+  blueprint_name: hadoop
+  wait_for_complete: True
+- name: Start the ambari cluster
+  ambari_cluster_state:
+host: localhost
+port: 8080
+username: admin
+password: admin
+cluster_name: my_cluster
+cluster_state: started
+wait_for_complete: True
+- name: Stop the ambari cluster
+  ambari_cluster_state:
+host: localhost
+port: 8080
+username: admin
+password: admin
+cluster_name: my_cluster
+cluster_state: stopped
+wait_for_complete: True
+- name: Delete the ambari cluster
+  ambari_cluster_state:
+host: localhost
+port: 8080
+username: admin
+password: admin
+cluster_name: my_cluster
+cluster_state: absent
+'''
+
+RETURN = '''
+results:
+description: The content of the requests object returned from the 
RESTful call
+returned: success
+type: string
+'''
+
+__author__ = 'apachemetron'
+
+import json
+
+try:
+import requests
+except ImportError:
+REQUESTS_FOUND = False
+else:
+REQUESTS_FOUND = True
+
+
+def main():
+
+argument_spec = dict(
+host=dict(type='str', default=None, required=True),
+port=dict(type='int', default=None, required=True),
+username=dict(type='str', default=None, required=True),
+password=dict(type='str', default=None, required=True),
+cluster_name=dict(type='str', default=None, required=True),
+state=dict(type='str', default=None, required=True,
+   choices=['started

[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104658438
  
--- Diff: metron-deployment/roles/ambari_config/vars/single_node_vm.yml ---
@@ -80,10 +88,32 @@ configurations:
   - kafka-broker:
   log.dirs: '{{ kafka_log_dirs }}'
   delete.topic.enable: "true"
+  - metron-env:
+  parsers: "bro,snort"
+  - elastic-site:
+  index_number_of_shards: 1
+  index_number_of_replicas: 0
+  zen_discovery_ping_unicast_hosts: "{{ groups.search | join(',') }}"
+  gateway_recover_after_data_nodes: 1
+  network_host: "_lo_,_eth0_,_eth1_"
+  masters_also_are_datanodes: "1"
--- End diff --

wasn't this a boolean earlier? Should this be true?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104655538
  
--- Diff: 
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/ELASTICSEARCH/2.3.3/configuration/elastic-site.xml
 ---
@@ -27,6 +27,14 @@
 Cluster name identifies your cluster
 
 
+masters_also_are_datanodes
+"false"
--- End diff --

Can we refactor things so that this is just `false`, rather than `"false"`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104661243
  
--- Diff: metron-deployment/roles/quick_dev/tasks/main.yml ---
@@ -15,23 +15,50 @@
 #  limitations under the License.
 #
 ---
-#
-# Workaround for Kafka not starting
-# Fire off async start followed by
-# Sync start -execution will pause until
-# final start completes.
-#
-- name: Start the ambari cluster - no wait
-  ambari_cluster_state:
+- name: Delete the Metron Components from Ambari
+  ambari_service_state:
 host: "{{ groups.ambari_master[0] }}"
 port: "{{ ambari_port }}"
 username: "{{ ambari_user }}"
 password: "{{ ambari_password }}"
 cluster_name: "{{ cluster_name }}"
-cluster_state: started
-wait_for_complete: False
+state: deleted
+component_name: "{{ item }}"
+component_host: "{{ inventory_hostname }}"
+  with_items:
+- METRON_ENRICHMENT_MASTER
+- METRON_INDEXING
+- METRON_PARSERS
+
+- name: Remove the Metron packages
+  package:
+name: "{{ item }}"
+state: absent
+  with_items:
+- metron-common
+- metron-data-management
+- metron-parsers
+- metron-enrichment
+- metron-indexing
+- metron-elasticsearch
+
+- name: Re-install the Metron Packages via Ambari
--- End diff --

Do we have any issues with the configured files still existing after this? 
I know RPMs don't like to touch files they didn't explicitly create, so I'm a 
little worried they still exist here.

I know guards were added (the `if(is_*_configured` earlier), but just want 
to call it out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #436: METRON-671: Refactor existing Ansible de...

2017-03-07 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/436#discussion_r104653567
  
--- Diff: metron-deployment/roles/kibana/README.md ---
@@ -1,35 +0,0 @@
-Kibana 4
-
-
-This role installs Kibana along with the default Metron Dashboard.
-
-### FAQ
-
- How do I change Metron's default dashboard?
--- End diff --

If you know where you want it, I'm actually writing that up as part of 
METRON-745 (which needed to use that module).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #471: METRON-755 Update GitHub PR Template

2017-03-06 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/471
  
I prefer top, but I don't really care that much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #469: DO NOT MERGE METRON-745: Create Error Dashboard...

2017-03-02 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/469
  
Alternative, and more sensical/readable approach, to the over time errors.
https://cloud.githubusercontent.com/assets/5077341/23526431/06c36a1a-ff60-11e6-93f1-dd8437fd0688.png";>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #469: DO NOT MERGE METRON-745: Create Error Da...

2017-03-02 Thread justinleet
GitHub user justinleet opened a pull request:

https://github.com/apache/incubator-metron/pull/469

DO NOT MERGE METRON-745: Create Error Dashboards

# DO NOT MERGE

## Summary
Based on Ryan's work in 
https://github.com/apache/incubator-metron/pull/453, I went ahead and created 
some a Kibana dashboard for tracking errors.  **That PR is not finalized in 
master so this should not be merged!** However, the data flowing to the index 
is pretty final, so unless the actual fields or field names change, it doesn't 
really affect this.

All we care about here is the dashboard itself, but unfortunately the 453 
changes get pulled along for the ride until that's in.

It's nothing too complicated, essentially just some high level overviews of 
the various fields output by Ryan (some counts, etc.), along with a pane for 
viewing the actual errors along with all their fields.  Note that they include 
both raw and unique message counts (via the hash fields) in most things.

I've attached some screenshots, but this can be also be spun up on an 
Ambari cluster (and will eventually have to be to be validated, given that the 
file isn't in a readable format).

I'm basically looking for feedback on what else would be useful and if we 
want to adjust anything.  Keep in mind, we don't actually have a lot of fields 
to work with (because if everything was good, we wouldn't be here in the first 
place!). See error_index.template for the fields we have.

### Notes
* I'm really not convinced the 'hostname' visualizations are needed.  The 
field is there and useful, but given that it's populated with the Storm host 
that failed, it seems like it's probably useless most of the time.
* Kibana occasionally rearranges the order of the visualizations (usually 
swapping a couple of the charts).  If I recall correctly, that's a known Kibana 
bug that we're stuck with.
* The graph teaches a lesson of "Don't load all your data at once if you 
want a pretty graph". Still, it's just a basic graph of the error counts over 
time.
* Keep in mind the graph shifts by the viewing window.  So last 15 minutes 
vs last 7 days all updates accordingly.

https://cloud.githubusercontent.com/assets/5077341/23518699/52eb58bc-ff42-11e6-912c-cc596fe46a3d.png";>
https://cloud.githubusercontent.com/assets/5077341/23518700/549c3f0a-ff42-11e6-8e26-18553ce804bc.png";>
https://cloud.githubusercontent.com/assets/5077341/23518702/5605c69a-ff42-11e6-8c76-15f485253e8f.png";>

The bottom pane extends further down, but we've all seen a table of data 
before.

### For all changes:
- [] Is there a JIRA ticket associated with this PR? If not one needs to be 
created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root incubating-metron folder via:

```
mvn -q clean integration-test install && build_utils/verify_licenses.sh 
```

- [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
site-book/target/site/index.html.

```
cd site-book
bin/generate-md.sh
mvn site:site

```

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommened that [travis-ci](https://travis-ci.org) is set up for 
your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git

[GitHub] incubator-metron issue #453: METRON-694: Index Errors from Topologies

2017-03-02 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/453
  
I tried running this up and discovered that there's at least one error that 
doesn't get caught.  Json parsing errors, e.g. if someone gives outright badly 
formatted messages to indexing (e.g. missing closing '}'), don't get caught and 
indexed right now.

I don't believe we ever handled this type of error, because I don't think 
it ever occurs from our code directly.  I'm inclined to not worry about it for 
this PR given that we never worried about it to being with, but we may want to 
create a follow on Jira to ensure that we handle cases like this well.  As we 
add and increase visibility to extension points, we don't want things like this 
getting tripped by custom code.

Anyone have objections to that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #465: METRON-741: Stellar Field Transformations shoul...

2017-02-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/465
  
+1, by inspection.  Thanks for grabbing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #464: METRON-740: Normalizing and adding log4j proper...

2017-02-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/464
  
+1 by inspection.  Nice to have this setup


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #438: METRON-686 Record Rule Set that Fired During Th...

2017-02-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/438
  
@nickwallen I have slight preference towards flattening, fixing, and 
unflattening. I'd rather conform to convention and keep things consistent for 
now.  I could pretty easily be persuaded to go with 1 if there's enough support 
for it and we think we'll address it relatively quickly.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

2017-02-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/463
  
Yep, my +1 is still in place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

2017-02-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/463
  
I figured out why

From the docs:
```

   List list = new LinkedList();
   List spy = spy(list);

   //Impossible: real method is called so spy.get(0) throws 
IndexOutOfBoundsException (the list is yet empty)
   when(spy.get(0)).thenReturn("foo");

   //You have to use doReturn() for stubbing
   doReturn("foo").when(spy).get(0);
```
So the first call to @cestella's trySplit() is in the when(), not in the 
lambda.  The subsequent calls are.  So everything shifts by one.

As the docs note "Sometimes it's impossible or impractical to use 
when(Object) for stubbing spies. Therefore when using spies please consider 
doReturn|Answer|Throw() family of methods for stubbing."

I just happened to try this, and now I know why it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

2017-02-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/463
  
@cestella Spy() syntax ends up working differently than mock() from what I 
can tell.

This worked for me
```
Spliterator delegatingSpliterator = spy(spliterator);
doAnswer(invocationOnMock -> {
  Spliterator ret = spliterator.trySplit();
  if(ret != null) {
numSplits.incrementAndGet();
  }
  return ret;
}).when(delegatingSpliterator).trySplit();
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

2017-02-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/463
  
+1, I appreciate you going ahead and taking this ticket, given that I've 
been bitten by it twice now.  Looks great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #463: METRON-728: ReaderSpliteratorTest fails ...

2017-02-23 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/463#discussion_r102776136
  
--- Diff: 
metron-platform/metron-common/src/test/java/org/apache/metron/common/utils/file/ReaderSpliteratorTest.java
 ---
@@ -97,88 +110,73 @@ public void testSequentialStreamLargeBatch() throws 
FileNotFoundException {
   Map count =
   stream.map(s -> s.trim())
   .collect(Collectors.toMap(s -> s, s -> 1, 
Integer::sum));
-  Assert.assertEquals(5, count.size());
-  Assert.assertEquals(3, (int) count.get("foo"));
-  Assert.assertEquals(2, (int) count.get("bar"));
-  Assert.assertEquals(1, (int) count.get("and"));
-  Assert.assertEquals(1, (int) count.get("the"));
+  validateMapCount(count);
 }
   }
 
-  @Test
-  public void testActuallyParallel() throws ExecutionException, 
InterruptedException, FileNotFoundException {
-//With 9 elements and a batch of 2, we should only ceil(9/2) = 5 
batches, so at most min(5, 2) = 2 threads will be used
-try( Stream stream = ReaderSpliterator.lineStream(getReader(), 
2)) {
-  ForkJoinPool forkJoinPool = new ForkJoinPool(2);
-  forkJoinPool.submit(() -> {
-Map threads =
-stream.parallel().map(s -> 
Thread.currentThread().getName())
-.collect(Collectors.toMap(s -> s, s -> 1, 
Integer::sum));
-Assert.assertTrue(threads.size() <= 2);
-  }
-  ).get();
-}
-  }
+  private int getNumberOfBatches(final ReaderSpliterator spliterator) 
throws ExecutionException, InterruptedException {
+final AtomicInteger numSplits = new AtomicInteger(0);
+//we want to wrap the spliterator and count the (valid) splits
+Spliterator delegatingSpliterator = new Spliterator() {
+  @Override
+  public boolean tryAdvance(Consumer action) {
+return spliterator.tryAdvance(action);
+  }
 
-  @Test
-  public void testActuallyParallel_mediumBatch() throws 
ExecutionException, InterruptedException, FileNotFoundException {
-//With 9 elements and a batch of 2, we should only ceil(9/2) = 5 
batches, so at most 5 threads of the pool of 10 will be used
-try( Stream stream = ReaderSpliterator.lineStream(getReader(), 
2)) {
-  ForkJoinPool forkJoinPool = new ForkJoinPool(10);
-  forkJoinPool.submit(() -> {
-Map threads =
-stream.parallel().map(s -> 
Thread.currentThread().getName())
-.collect(Collectors.toMap(s -> s, s -> 1, 
Integer::sum));
-Assert.assertTrue(threads.size() <= (int) Math.ceil(9.0 / 
2) && threads.size() > 1);
-  }
-  ).get();
-}
+  @Override
+  public Spliterator trySplit() {
+Spliterator ret = spliterator.trySplit();
+if(ret != null) {
+  numSplits.incrementAndGet();
+}
+return ret;
+  }
+
+  @Override
+  public long estimateSize() {
+return spliterator.estimateSize();
+  }
+
+  @Override
+  public int characteristics() {
+return spliterator.characteristics();
+  }
+};
+
+Stream stream = StreamSupport.stream(delegatingSpliterator, 
true);
+
+//now run it in a parallel pool and do some calculation that doesn't 
really matter.
+ForkJoinPool forkJoinPool = new ForkJoinPool(10);
--- End diff --

Incredibly minor point, but since we no longer care about the actual 
execution and aren't running it a lot, it seems appropriate to just use 
ForkJoinPool.commonPool(), and drop the shutdown line.

This is entirely up to you if you want to change, I don't consider it 
blocking by any means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

2017-02-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/463
  
@cestella that is a much better way of stating it, and exactly what I was 
alluding to.  I'll look through the new commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

2017-02-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/463
  
@cestella The more I'm thinking about this, the more I wonder if this test 
is inherently structured incorrectly. My thinking is that it seems more like 
we're testing whether or not a stream can run in parallel, rather than that the 
stream produced by the spliterator meets the appropriate contracts for a stream.

Is there a way to restructure this so that it just tests "Does this meet 
the criteria of a Java stream?", rather than "Can a stream in Java run in 
parallel?"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

2017-02-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/463
  
Nevermind, I can't read.  You ran the whole test 100k times, correct?  I'm 
fine with that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

2017-02-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/463
  
Are we settling on "less sporadic"?  Like I noted in the ticket, I had the 
original test run for over a minute (~90 seconds) before the JVM decided to 
actually be single threaded.  It's not the usual case but I probably only ran 
it 20 or so times before I hit the 90 second case.

It seems more likely to fail in Travis, which is fine, but I'm not sure I 
want my local build failing that often.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #462: METRON-734 Builds failing because of MaxMind DB...

2017-02-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/462
  
Apparently https://issues.apache.org/jira/browse/METRON-728 occurs more 
frequently on travis than my local machine.

The Travis running on my personal account already succeed 
(https://travis-ci.org/justinleet/incubator-metron/builds/204210174)

I'll kick Travis and hopefully we aren't waiting 16 hours for a build


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron pull request #462: METRON-734 Builds failing because of Max...

2017-02-23 Thread justinleet
Github user justinleet closed the pull request at:

https://github.com/apache/incubator-metron/pull/462


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   4   >