xinyuwan commented on issue #337:
URL: 
https://github.com/apache/incubator-datasketches-java/issues/337#issuecomment-718160923


   Thanks @jmalkin @leerho for the quick response. Let me add more context to 
the issue:
   
   **Problem**: we encounter this ArrayOOB exception non-deterministically. The 
same input may fail once and succeed later and I cannot reproduce the error 
from local when I do individual calls to the getCompactSketch().
   **Library version**: 
org.apache.datasketches:datasketches-java:1.3.0-incubating
   **Use case**: Here is a description on how we are using the sketch:
   
   1. We are aggregating reach metrics from minute granularity to hourly and 
then to daily granularity. We do this by inserting UUIDs into UpdateSketch and 
serialize the compact form of it into Protobuf ByteString.
   2. In the minute-to-hour and hour-to-day aggregation, we are deserializing 
the ByteString back to Sketch and Union them. 
   3. Once all minutse of one hour(or hours of one day) are all updated to the 
Union, we call Union.getResult() and serialize it into Protobuf ByteString 
again. The error only occurs during hour-to-day Union.getResult() and 
non-deterministically (Not sure if this is because the size of the sketch to be 
merged to Union is larger at this time). The error rate is about 5% of the 
total requests.
   4. Throughout the aggregation, we use Norminal Entries (K) = 1024 for both 
UpdateSketch and Union.
   
   Here is some code snippet:
   1. We have SingleEntityUnionAccumulator which takes a counter key enum and 
ByteString of the sketch (compacted update sketch)
   ```
   public class SingleEntityUnionAccumulator {
   
       final SketchOperations sketchOperations;
       final Map<AdImpressionStatsCounter, ReachData> reach;
   
       public SingleEntityUnionAccumulator(@Nonnull final SketchOperations 
sketchOperations) {
           super(sketchOperations);
       }
   
       public void accumulate(
               final AdImpressionStatsCounter counterKey, final ByteString 
sketchBytes) {
           ReachData reachData = putReachDataIfAbsent(counterKey);
           Sketch sketch = sketchOperations.byteStringToSketch(sketchBytes);
           reachData.getUnion().update(sketch);
       }
   
       public Optional<SingleEntityReachData> toReachData() {
           // return null if there is no reach data to write to BT
           if (MapUtils.isEmpty(this.getReach())) {
               return Optional.empty();
           }
   
           Map<Integer, ReachDataEntry> dataEntryMap =
                   this.getReach().entrySet().stream()
                           .collect(
                                   toMap(
                                           e -> e.getKey().getHash(),
                                           e -> 
e.getValue().toReachDataEntry()));
           return Optional.of(
                   SingleEntityReachData.newBuilder()
                           .putAllDataEntryByCounter(dataEntryMap)
                           .setEntityHierarchy(this.getHierarchy())
                           .build());
       }
   }
   ```
   2 The toReachData() method is where we see the exception throwing from. 
Specifically,  e -> e.getValue().toReachDataEntry())) which calls 
Union.getResult()
   ```
       public ReachDataEntry toReachDataEntry() {
           return ReachDataEntry.newBuilder()
                   
.setSketch(sketchOperations.sketchToByteString(getCompactSketch()))
                   .setSeedValue(seedValue)
                   .build();
       }
   
       public CompactSketch getCompactSketch() {
           return this.union.getResult();
       }
   
   ```
   3. The SketchOperations is a helper class doing all the SerDe of sketch and 
union. In this case:
   ```
       @Override
       public ByteString sketchToByteString(final Sketch sketch) {
           return ByteString.copyFrom(sketch.compact().toByteArray());
       }
   
       @Override
       public Sketch byteStringToSketch(final ByteString sketchBytes) {
           return Sketches.wrapSketch(Memory.wrap(sketchBytes.toByteArray()));
       }
   
   ```
   
   I'm not sure if ArrayIndexOOB indicates that something wrong on the 
memory/heap side, but can you guys let us know if this can be a cause during 
the Union.getResult()?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to