xinyuwan commented on issue #337:
URL:
https://github.com/apache/incubator-datasketches-java/issues/337#issuecomment-718160923
Thanks @jmalkin @leerho for the quick response. Let me add more context to
the issue:
**Problem**: we encounter this ArrayOOB exception non-deterministically. The
same input may fail once and succeed later and I cannot reproduce the error
from local when I do individual calls to the getCompactSketch().
**Library version**:
org.apache.datasketches:datasketches-java:1.3.0-incubating
**Use case**: Here is a description on how we are using the sketch:
1. We are aggregating reach metrics from minute granularity to hourly and
then to daily granularity. We do this by inserting UUIDs into UpdateSketch and
serialize the compact form of it into Protobuf ByteString.
2. In the minute-to-hour and hour-to-day aggregation, we are deserializing
the ByteString back to Sketch and Union them.
3. Once all minutse of one hour(or hours of one day) are all updated to the
Union, we call Union.getResult() and serialize it into Protobuf ByteString
again. The error only occurs during hour-to-day Union.getResult() and
non-deterministically (Not sure if this is because the size of the sketch to be
merged to Union is larger at this time). The error rate is about 5% of the
total requests.
4. Throughout the aggregation, we use Norminal Entries (K) = 1024 for both
UpdateSketch and Union.
Here is some code snippet:
1. We have SingleEntityUnionAccumulator which takes a counter key enum and
ByteString of the sketch (compacted update sketch)
```
public class SingleEntityUnionAccumulator {
final SketchOperations sketchOperations;
final Map<AdImpressionStatsCounter, ReachData> reach;
public SingleEntityUnionAccumulator(@Nonnull final SketchOperations
sketchOperations) {
super(sketchOperations);
}
public void accumulate(
final AdImpressionStatsCounter counterKey, final ByteString
sketchBytes) {
ReachData reachData = putReachDataIfAbsent(counterKey);
Sketch sketch = sketchOperations.byteStringToSketch(sketchBytes);
reachData.getUnion().update(sketch);
}
public Optional<SingleEntityReachData> toReachData() {
// return null if there is no reach data to write to BT
if (MapUtils.isEmpty(this.getReach())) {
return Optional.empty();
}
Map<Integer, ReachDataEntry> dataEntryMap =
this.getReach().entrySet().stream()
.collect(
toMap(
e -> e.getKey().getHash(),
e ->
e.getValue().toReachDataEntry()));
return Optional.of(
SingleEntityReachData.newBuilder()
.putAllDataEntryByCounter(dataEntryMap)
.setEntityHierarchy(this.getHierarchy())
.build());
}
}
```
2 The toReachData() method is where we see the exception throwing from.
Specifically, e -> e.getValue().toReachDataEntry())) which calls
Union.getResult()
```
public ReachDataEntry toReachDataEntry() {
return ReachDataEntry.newBuilder()
.setSketch(sketchOperations.sketchToByteString(getCompactSketch()))
.setSeedValue(seedValue)
.build();
}
public CompactSketch getCompactSketch() {
return this.union.getResult();
}
```
3. The SketchOperations is a helper class doing all the SerDe of sketch and
union. In this case:
```
@Override
public ByteString sketchToByteString(final Sketch sketch) {
return ByteString.copyFrom(sketch.compact().toByteArray());
}
@Override
public Sketch byteStringToSketch(final ByteString sketchBytes) {
return Sketches.wrapSketch(Memory.wrap(sketchBytes.toByteArray()));
}
```
I'm not sure if ArrayIndexOOB indicates that something wrong on the
memory/heap side, but can you guys let us know if this can be a cause during
the Union.getResult()?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]