Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/729#discussion_r102865583
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -390,4 +391,15 @@
String DYNAMIC_UDF_SUPPORT_ENABLED = "exec.udf.enable_dynamic_support";
BooleanValidator DYNAMIC_UDF_SUPPORT_ENABLED_VALIDATOR = new
BooleanValidator(DYNAMIC_UDF_SUPPORT_ENABLED, true, true);
+
+ /**
+ * Option whose value is a long value representing the number of bits
required for computing ndv (using HLL)
+ */
+ LongValidator NDV_MEMORY_LIMIT = new
PositiveLongValidator("exec.statistics.ndv_memory_limit", 30, 20);
+
+ /**
+ * Option whose value represents the current version of the statistics.
Decreasing the value will generate
+ * the older version of statistics
+ */
+ LongValidator STATISTICS_VERSION = new
NonNegativeLongValidator("exec.statistics.capability_version", 1, 1);
--- End diff --
Having a statistics version number makes sense. What I disagree on is how
we are managing the version.
The version is defined by the code that gathers and writes the stats. If
I'm running a Drill that has version 3 of the implementation, I write version 3
files. That version number should be a constant defined in the code. When we
change stats format, we bump the version number.
The reader should handle old versions of the file: at least one older
version (to ease software upgrades.) The reader retrieves the version from the
file and checks if it is supported by the reader implementation.
This is all very standard practice.
Where, then, is there room for the user to specify a version? What does
specifying a version mean? This is the question we need to clarify.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---