#feat-presto-connector


@alexj.nich: @alexj.nich has joined the channel

#troubleshooting


@michael: I've noticed if I delete segments from the UI it only removes them from ZK but not deep storage. The next time I run an ingestion job for the table unrelated to the deleted segments it re-adds them to the table. Is this expected? Am I missing something?
  @g.kishore: I dont think the delete call, deletes it from the deep store. We delete it from deep store only retention manager kicks in (which is based on the retention set in table config)
  @michael: It's expected that the ingestion job add the deleted segments back? The deleted segments even have different prefix than the one the job is running
  @g.kishore: > The next time I run an ingestion job for the table unrelated to the deleted segments it re-adds them to the table. Is this expected? Am I missing something? This is not expected. Can you show the segments list and ingestion job spec
  @michael: ```executionFrameworkSpec: name: 'standalone' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner' jobType: SegmentCreationAndUriPush inputDirURI: '' includeFileNamePattern: 'glob:**/*.parquet' outputDirURI: '' segmentCreationJobParallelism: 4 overwriteOutput: true pinotFSSpecs: - scheme: s3 className: org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: 'us-east-1' endpoint: '' accessKey: 'pinot' secretKey: 'pinot!!!' recordReaderSpec: dataFormat: 'parquet' className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader' tableSpec: tableName: 'mm' schemaURI: '' tableConfigURI: '' pinotClusterSpecs: - controllerURI: '' pushJobSpec: pushAttempts: 2 pushRetryIntervalMillis: 1000 segmentNameGeneratorSpec: type: normalizedDate configs: segment.name.prefix: 'mm_batch_test'```
  @michael: ```{ "id": "mm_OFFLINE", "simpleFields": { "BATCH_MESSAGE_MODE": "false", "IDEAL_STATE_MODE": "CUSTOMIZED", "INSTANCE_GROUP_TAG": "mm_OFFLINE", "MAX_PARTITIONS_PER_INSTANCE": "1", "NUM_PARTITIONS": "3", "REBALANCE_MODE": "CUSTOMIZED", "REPLICAS": "1", "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel", "STATE_MODEL_FACTORY_NAME": "DEFAULT" }, "mapFields": { "mm_batch_test_2020-11-19_2020-11-19_0": { "Server_172.20.0.6_8098": "ONLINE" }, "mm_batch_test_2020-11-19_2020-11-19_1": { "Server_172.20.0.6_8098": "ONLINE" }, "mm_batch_test_2020-11-19_2020-11-19_2": { "Server_172.20.0.6_8098": "ONLINE" } }, "listFields": {} }```
  @michael: after running the job:
  @michael: ```{ "id": "mm_OFFLINE", "simpleFields": { "BATCH_MESSAGE_MODE": "false", "IDEAL_STATE_MODE": "CUSTOMIZED", "INSTANCE_GROUP_TAG": "mm_OFFLINE", "MAX_PARTITIONS_PER_INSTANCE": "1", "NUM_PARTITIONS": "7", "REBALANCE_MODE": "CUSTOMIZED", "REPLICAS": "1", "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel", "STATE_MODEL_FACTORY_NAME": "DEFAULT" }, "mapFields": { "mm_batch1_test_2020-11-19_2020-11-19_0": { "Server_172.20.0.6_8098": "ONLINE" }, "mm_batch1_test_2020-11-19_2020-11-19_1": { "Server_172.20.0.6_8098": "ONLINE" }, "mm_batch2_test_2020-11-19_2020-11-19_0": { "Server_172.20.0.6_8098": "ONLINE" }, "mm_batch2_test_2020-11-19_2020-11-19_1": { "Server_172.20.0.6_8098": "ONLINE" }, "mm_batch_test_2020-11-19_2020-11-19_0": { "Server_172.20.0.6_8098": "ONLINE" }, "mm_batch_test_2020-11-19_2020-11-19_1": { "Server_172.20.0.6_8098": "ONLINE" }, "mm_batch_test_2020-11-19_2020-11-19_2": { "Server_172.20.0.6_8098": "ONLINE" } }, "listFields": {} }```
  @michael: picked up the old deleted segments from other batch jobs
  @michael: I was expecting it to just replace the existing mm_batch_test segments
  @g.kishore: May be because of old segment is still there in the output for of the ingestion job?
  @michael: yes I see that
  @michael: what's the purpose of the ingestion job output and leaving the output files there?
  @g.kishore: I don’t see any reason
  @g.kishore: We should delete it... also in spark mode the task directory gets deleted automatically after task is run.. that’s probably why we don’t delete it explicitly
  @g.kishore: Mind filing an issue?
  @michael: Sure
  @michael: thank you
  @ssubrama: The delete API should delete it right away, not just when the retention manager kicks in. It will move it to the deleted folder inside your deep store (I forget whether this is based on config), where will reside for some number of days, and then be removed. It is a bug if the segments still show up on a new table. It is possible if the new table is added within a very short time of deletion, because it takes a few seconds for the segments to be deleted, and then for helix externalview to stabilize. So, we always advise creating tables with a different name
  @michael: I was deleting individual segments of the table and never saw them removed from deep storage, are the servers responsible for task for deleting?
  @ssubrama: Nope, they should get deleted when you delete the segments. Like I said, they are moved into a folder called `Deleted_Segments/tableName`
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to