Github user nickwallen commented on a diff in the pull request:
https://github.com/apache/metron/pull/622#discussion_r127330773
--- Diff:
metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/hbase/DecodableRowKeyBuilder.java
---
@@ -0,0 +1,382 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.metron.profiler.hbase;
+
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.metron.profiler.ProfileMeasurement;
+import org.apache.metron.profiler.ProfilePeriod;
+
+import java.nio.BufferUnderflowException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Responsible for building the row keys used to store profile data in
HBase.
+ *
+ * This builder generates decodable row keys. A decodable row key is one
that can be interrogated to extract
+ * the constituent components of that row key. Given a previously
generated row key this builder
+ * can extract the profile name, entity name, group name(s), period
duration, and period.
+ *
+ * The row key is composed of the following fields.
+ * <ul>
+ * <li>magic number - Helps to validate the row key.</li>
+ * <li>version - The version number of the row key.</li>
+ * <li>salt - A salt that helps prevent hot-spotting.
+ * <li>profile - The name of the profile.
+ * <li>entity - The name of the entity being profiled.
+ * <li>group(s) - The group(s) used to sort the data in HBase. For
example, a group may distinguish between weekends and weekdays.
+ * <li>period - The period in which the measurement was taken. The first
period starts at the epoch and increases monotonically.
+ * </ul>
+ */
+public class DecodableRowKeyBuilder implements RowKeyBuilder {
+
+ /**
+ * Defines the byte order when encoding and decoding the row keys.
+ *
+ * Making this configurable is likely not necessary and is left as a
practice exercise for the reader. :)
+ */
+ private static final ByteOrder byteOrder = ByteOrder.BIG_ENDIAN;
+
+ /**
+ * Defines some level of sane max field length to avoid any shenanigans
with oddly encoded row keys.
+ */
+ private static final int MAX_FIELD_LENGTH = 1000;
+
+ /**
+ * A magic number embedded in each row key to help validate the row key
and byte ordering when decoding.
+ */
+ protected static final short MAGIC_NUMBER = 77;
+
+ /**
+ * The version number of the row keys supported by this builder.
+ */
+ protected static final byte VERSION = (byte) 1;
--- End diff --
I added a `VERSION` field to the row key, hoping that this might help
future changes to the `RowKeyBuilder`. With this, I could potentially start to
parse the row key and then choose the right `RowKeyBuilder` implementation; the
one used to create the row key. This would make row key changes seemless to
users.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---