Github user justinleet commented on a diff in the pull request:
https://github.com/apache/metron/pull/824#discussion_r150854643
--- Diff:
metron-platform/metron-indexing/src/main/java/org/apache/metron/indexing/dao/HBaseDao.java
---
@@ -135,8 +138,9 @@ private Document getDocumentFromResult(Result result)
throws IOException {
Map.Entry<byte[], byte[]> entry= columns.lastEntry();
Long ts = Bytes.toLong(entry.getKey());
if(entry.getValue()!= null) {
- String json = new String(entry.getValue());
- return new Document(json, Bytes.toString(result.getRow()), null, ts);
+ Map<String, Object> json = JSONUtils.INSTANCE.load(new
String(entry.getValue()), new TypeReference<Map<String, Object>>() {
+ });
+ return new Document(json, Bytes.toString(result.getRow()), (String)
json.get(SOURCE_TYPE), ts);
--- End diff --
I would prefer to see one of two things happen here. Either we keep the
constant in the ES specific classes (which is admittedly less than ideal, but
it does limit the pollution of ES knowledge into HBase classes) and populate
source type from there (basically moving the loading and source type population
there). Alternatively, we pass in a more general function that can be applied
to the fields and configure and handle it appropriately.
I think the second one is probably more general useful to be able to do,
but given the state of ES5 upgrade making this particular case obsolete, I'm
amenable to doing the first option.
At bare minimum we should replace the '.'s with ':'s only if present. Even
if there's not a Solr implementation, I don't want HBaseDao tied to ES so
directly.
@cestella Do you have a preference on implementation? I know you'd had
some comments earlier, but I don't want to put words in your mouth.
---