Per (2), I think it makes sense to make the charset configurable, but with
the proposal of 3 separate settings, wouldn't things blow up horribly if
the Parsers are producing UTF-8, but Enrichment is expecting UTF-16?  They
are not even speaking the same language, no?

This makes me think that we need to understand the scenarios under which a
user would want to change the charset, before we know what kinds of levers
to bake-in.  What sort of use case would drive someone to change the
charset?  Would there be a particular sensor producing telemetry with a
different charset from others?  If one source of telemetry is different
than the others, would the entire system even work with varying charsets?

Without a good understanding of use cases, I think the only mildly safe
thing to do right now, is to have a single, global charset setting.  The
user would have to ensure that their entire environment and all the JVMs
driving it are set to use that charset.

Perhaps my questioning comes from a lack of understanding of charsets.  I
can't remember ever having to deviate from the default.  Please chime in
and educate me, if I am off base.






On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jjmey...@gmail.com> wrote:

> Hello everybody,
>
> Currently our build has a significant amount of warnings (most from the
> error prone plugin). A lot of this has been documented here:
> https://issues.apache.org/jira/browse/METRON-617
>
> I want to continue to work on this Jira. I really want to reduce the number
> of warnings in our build. As the Jira points out, a lot of them are
> unchecked casts or the implicit use of default charset.
>
> When starting this, I want to start with a specific module. I plan on
> starting with `metron-interface` as it's fairly self contained and is
> pretty new. Below I want to layout what I plan on doing. Please let me know
> what you all think:
>
> 1. Suppress warnings where generics are not supported or checking type is
> not possible.
> 2. For all unit tests that require supplying a charset I'll supply the
> UTF-8 charset.
> 3. Update the API to have a charset configuration for each resource that
> needs one.
> 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out error
> prone doesn't support this level of suppression. Which is probably a good
> thing. We will need to explicitly state what we want to suppress instead.
>
> Two big questions came up right away when I was thinking about the above:
>
> 1. When suppressing warnings. I see we sometimes cast a JSONObject to
> Map<String, Object>. I know it extends Map, but is it really safe to do
> something like the following? Should we have a utility that truly builds a
> map that uses generics? I plan on doing some testing around this, but if
> anyone has any experience with this it would be greatly appreciated. Here
> is an example of what I am talking about:
>
> JSONObject message = ...;
> @SuppressWarnings("unchecked")
> Map<String, Object> state = (Map<String, Object>) message;
>
>
> 2. This one is about configuring charset (#3 above). Specifically in
> metron-rest. Right now, I believe there are 3 sensor resources (index,
> enrichment, and parser) that throw warnings due to implicitly using the
> default charset. I propose that we have 3 configs (listed below). These
> configs would take any valid charset, default, or not set. If not set then
> UTF-8 would be default. Does this seem fair?
>
> sensor:
>   index.encoding: UTF-8
>   enrichment.encoding: UTF-8
>   parser.encoding: UTF-8
>
>
> Does anyone see any problems that may occur if we go about it this way? Any
> help on this would be very much appreciated.
>
> Thanks,
> JJ
>

Reply via email to