Hi,
We are building out a realtime ingestion pipeline using Kafka Indexing
service for Druid. In order to achieve better rollup, I was trying out the
hadoop based reingestion job
http://druid.io/docs/latest/ingestion/update-existing-data.html which
basically uses the datasource itself as the input.
When I ran the job, it failed because it was trying to read segment
metadata from druid_segments table and not from the table,
customprefix_segments, I specified in the metadataUpdateSpec.
"metadataUpdateSpec": {
"connectURI": "jdbc:mysql...",
"password": "XXXXXXX",
"segmentTable": "customprefix_segments",
"type": "mysql",
"user": "XXXXXXXX"
},
Looking at the code, I see that the segmentTable specified in the spec is
actually passed in as pending_segments table (3rd param is for
pending_segments and 4th param is for segments table)
https://github.com/apache/incubator-druid/blob/master/indexing-hadoop/src/main/java/org/apache/druid/indexer/updater/MetadataStorageUpdaterJobSpec.java#L92
As a result, the re-ingestion job tries to read from the default segments
table named DRUID_SEGMENTS which isn't present.
Is this intentional or a bug?
Is there a way to configure the segments table name for this kind of
re-ingestion job?