Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17902#discussion_r119649224
  
    --- Diff: 
common/kvstore/src/main/java/org/apache/spark/kvstore/LevelDBTypeInfo.java ---
    @@ -0,0 +1,502 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.kvstore;
    +
    +import java.lang.reflect.Array;
    +import java.lang.reflect.Field;
    +import java.lang.reflect.Method;
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.Map;
    +import static java.nio.charset.StandardCharsets.UTF_8;
    +
    +import com.google.common.base.Preconditions;
    +import com.google.common.base.Throwables;
    +import org.iq80.leveldb.WriteBatch;
    +
    +/**
    + * Holds metadata about app-specific types stored in LevelDB. Serves as a 
cache for data collected
    + * via reflection, to make it cheaper to access it multiple times.
    + *
    + * <p>
    + * The hierarchy of keys stored in LevelDB looks roughly like the 
following. This hierarchy ensures
    + * that iteration over indices is easy, and that updating values in the 
store is not overly
    + * expensive. Of note, indices choose using more disk space (one value per 
key) instead of keeping
    + * lists of pointers, which would be more expensive to update at runtime.
    + * </p>
    + *
    + * <p>
    + * Indentation defines when a sub-key lives under a parent key. In 
LevelDB, this means the full
    + * key would be the concatenation of everything up to that point in the 
hierarchy, with each
    + * component separated by a NULL byte.
    + * </p>
    + *
    + * <pre>
    + * +TYPE_NAME
    + *   NATURAL_INDEX
    + *     +NATURAL_KEY
    + *     -
    + *   -NATURAL_INDEX
    + *   INDEX_NAME
    + *     +INDEX_VALUE
    + *       +NATURAL_KEY
    + *     -INDEX_VALUE
    + *     .INDEX_VALUE
    + *       CHILD_INDEX_NAME
    + *         +CHILD_INDEX_VALUE
    + *           NATURAL_KEY_OR_DATA
    + *         -
    + *   -INDEX_NAME
    + * </pre>
    + *
    + * <p>
    + * Entity data (either the entity's natural key or a copy of the data) is 
stored in all keys
    + * that end with "+<something>". A count of all objects that match a 
particular top-level index
    + * value is kept at the end marker ("-<something>"). A count is also kept 
at the natural index's end
    + * marker, to make it easy to retrieve the number of all elements of a 
particular type.
    + * </p>
    + *
    + * <p>
    + * To illustrate, given a type "Foo", with a natural index and a second 
index called "bar", you'd
    + * have these keys and values in the store for two instances, one with 
natural key "key1" and the
    + * other "key2", both with value "yes" for "bar":
    + * </p>
    + *
    + * <pre>
    + * Foo __main__ +key1   [data for instance 1]
    + * Foo __main__ +key2   [data for instance 2]
    --- End diff --
    
    one thing I had some trouble keeping straight as I read through this was 
the difference between an index "key" and an index "value".  Normally i think 
of "value" as what you are calling "data" here.  It seems like you are calling 
the index value just final component "+key1", while the key refers to the 
entire thing "Foo __main__ +key1".  
    
    Is that right?
    
    It might also help adding comments on `entityKey()` and `getValue()` as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to