Per (2), I think it makes sense to make the charset configurable, but with the proposal of 3 separate settings, wouldn't things blow up horribly if the Parsers are producing UTF-8, but Enrichment is expecting UTF-16? They are not even speaking the same language, no?
This makes me think that we need to understand the scenarios under which a user would want to change the charset, before we know what kinds of levers to bake-in. What sort of use case would drive someone to change the charset? Would there be a particular sensor producing telemetry with a different charset from others? If one source of telemetry is different than the others, would the entire system even work with varying charsets? Without a good understanding of use cases, I think the only mildly safe thing to do right now, is to have a single, global charset setting. The user would have to ensure that their entire environment and all the JVMs driving it are set to use that charset. Perhaps my questioning comes from a lack of understanding of charsets. I can't remember ever having to deviate from the default. Please chime in and educate me, if I am off base. On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jjmey...@gmail.com> wrote: > Hello everybody, > > Currently our build has a significant amount of warnings (most from the > error prone plugin). A lot of this has been documented here: > https://issues.apache.org/jira/browse/METRON-617 > > I want to continue to work on this Jira. I really want to reduce the number > of warnings in our build. As the Jira points out, a lot of them are > unchecked casts or the implicit use of default charset. > > When starting this, I want to start with a specific module. I plan on > starting with `metron-interface` as it's fairly self contained and is > pretty new. Below I want to layout what I plan on doing. Please let me know > what you all think: > > 1. Suppress warnings where generics are not supported or checking type is > not possible. > 2. For all unit tests that require supplying a charset I'll supply the > UTF-8 charset. > 3. Update the API to have a charset configuration for each resource that > needs one. > 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out error > prone doesn't support this level of suppression. Which is probably a good > thing. We will need to explicitly state what we want to suppress instead. > > Two big questions came up right away when I was thinking about the above: > > 1. When suppressing warnings. I see we sometimes cast a JSONObject to > Map<String, Object>. I know it extends Map, but is it really safe to do > something like the following? Should we have a utility that truly builds a > map that uses generics? I plan on doing some testing around this, but if > anyone has any experience with this it would be greatly appreciated. Here > is an example of what I am talking about: > > JSONObject message = ...; > @SuppressWarnings("unchecked") > Map<String, Object> state = (Map<String, Object>) message; > > > 2. This one is about configuring charset (#3 above). Specifically in > metron-rest. Right now, I believe there are 3 sensor resources (index, > enrichment, and parser) that throw warnings due to implicitly using the > default charset. I propose that we have 3 configs (listed below). These > configs would take any valid charset, default, or not set. If not set then > UTF-8 would be default. Does this seem fair? > > sensor: > index.encoding: UTF-8 > enrichment.encoding: UTF-8 > parser.encoding: UTF-8 > > > Does anyone see any problems that may occur if we go about it this way? Any > help on this would be very much appreciated. > > Thanks, > JJ >