DarwinKatanamp opened a new issue, #5517:
URL: https://github.com/apache/accumulo/issues/5517
**Describe the bug**
When doing a bulk import on tables that have a Bloom filter enabled, the
external compactions fail with the error:
```
compactor_q1 org.apache.accumulo.compactor.Compactor 449 ERROR
Compactor thread was interrupted waiting for compaction to start, cancelling
job
java.lang.UnsupportedOperationException
at
org.apache.accumulo.core.file.BloomFilterLayer$Reader.estimateOverlappingEntries(BloomFilterLayer.java:434)
at
org.apache.accumulo.compactor.Compactor.estimateOverlappingEntries(Compactor.java:635)
at
org.apache.accumulo.compactor.Compactor$2.lambda$initialize$0(Compactor.java:546)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at
org.apache.accumulo.compactor.Compactor$2.initialize(Compactor.java:540)
at org.apache.accumulo.compactor.Compactor.run(Compactor.java:751)
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
at java.base/java.lang.Thread.run(Thread.java:1583)
```
**Versions (OS, Maven, Java, and others, as appropriate):**
- Affected version(s) of this project: 2.1.3
**To Reproduce**
Steps to reproduce the behavior (or a link to an example repository that
reproduces the problem):
Start a local fluo-uno cluster with the default external compactors enabled
as defined in fluo-uno/install/accumulo-2.1.3/conf/cluster.yaml
branch: main (e8f3ba9), accumulo version 2.1.3
Generate local bulk import files with
accumulo-examples/src/main/java/org/apache/accumulo/examples/mapreduce/bulk/BulkIngestExample.java
I disabled the client.tableOperations().importDirectory and performed the
bulk import in the Accumulo shell.
I changed the default 1k rows to 2M rows.
branch: 2.1 (9d400cd)
Then copy to HDFS
```
hadoop fs -mkdir -p /tmp/bulkWork
hadoop fs -copyFromLocal /.../accumulo-examples/tmp/bulkWork/ /tmp/bulkWork
```
In the Accumulo shell:
```createtable test1```
Enable bloom filter
```config -t test1 -s table.bloom.enabled=true```
This config is very likely not necessary (but I figured it'd help triggering
compactions)
```config -t test1 -s table.split.threshold=100K```
Configure external compactions in the shell:
```
config -s
tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
config -s
'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"large","type":"external","queue":"q1"}]'
config -t test1 -s table.compaction.dispatcher.opts.service=cs1
```
Do the bulk load
```importdirectory -t test1 /tmp/bulkWork/bulkWork/files true```
Start compaction in the shell
```compact -t test1 -w```
Resulting in the specified errors in the Monitor.
**Expected behavior**
No errors when externally compacting bulk-loaded bloom filter-enabled tables.
**Additional context**
Note unrelated to the problem:
I had to disable
``` <arg>-Xlint:all</arg>```
in the root pom.xml for the project to compile (with a clean clone, Java 21).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]