hudi-bot opened a new issue, #14684:
URL: https://github.com/apache/hudi/issues/14684

   Since DeltaStreamer makes heavily use of file listing, if the source 
contains a lot of tiny files, this could  quickly become a bottle neck. We need 
a way to delete/archive files once processed by DeltaStreamer. 
   
   It seems like the best way to reliably clean up the source is after 
DeltaSync commit the checkpoint successfully. We could add a new public method 
to Source e.g. `postCommit()` and invoke it after each successful commit 
   
   Reference:
   
   
[https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources]
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-1348
   - Type: Improvement
   
   
   ---
   
   
   ## Comments
   
   21/Oct/20 12:59;vho;[~vinoth] what do you think about this? I'm not sure if 
there's already a solution ;;;
   
   ---
   
   24/Oct/20 02:12;vinoth;[~vho] thanks For bringing this up. It seems valuable 
to support such an option. 
   I am not aware of any other work along these lines. [~bhasudha] is looking 
into parallelism for input source listing;;;
   
   ---
   
   08/Aug/21 20:17;githubbot;hudi-bot edited a comment on pull request #2210:
   URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641
   
   
      <!--
      Meta data
      {
        "version" : 1,
        "metaDataEntries" : [ {
          "hash" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "status" : "FAILURE",
          "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210";,
          "triggerID" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "triggerType" : "PUSH"
        } ]
      }-->
      ## CI report:
      
      * b845e34d11e4e44e2b41e2089349baddc3a10b80 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210)
 
      
      <details>
      <summary>Bot commands</summary>
        The @flinkbot bot supports the following commands:
      
       - `@flinkbot run travis` re-run the last Travis build
       - `@flinkbot run azure` re-run the last Azure build
      </details>
   
   
   -- 
   This is an automated message from the Apache Git Service.
   To respond to the message, please log on to GitHub and use the
   URL above to go to the specific comment.
   
   To unsubscribe, e-mail: [email protected]
   
   For queries about this service, please contact Infrastructure at:
   [email protected]
   ;;;
   
   ---
   
   09/Aug/21 04:21;githubbot;hudi-bot edited a comment on pull request #2210:
   URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641
   
   
      <!--
      Meta data
      {
        "version" : 1,
        "metaDataEntries" : [ {
          "hash" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "status" : "FAILURE",
          "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210";,
          "triggerID" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "triggerType" : "PUSH"
        } ]
      }-->
      ## CI report:
      
      * b845e34d11e4e44e2b41e2089349baddc3a10b80 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210)
 
      
      <details>
      <summary>Bot commands</summary>
        @hudi-bot supports the following commands:
      
       - `@hudi-bot run travis` re-run the last Travis build
       - `@hudi-bot run azure` re-run the last Azure build
      </details>
   
   
   -- 
   This is an automated message from the Apache Git Service.
   To respond to the message, please log on to GitHub and use the
   URL above to go to the specific comment.
   
   To unsubscribe, e-mail: [email protected]
   
   For queries about this service, please contact Infrastructure at:
   [email protected]
   ;;;
   
   ---
   
   11/Aug/21 22:25;githubbot;hudi-bot edited a comment on pull request #2210:
   URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641
   
   
      <!--
      Meta data
      {
        "version" : 1,
        "metaDataEntries" : [ {
          "hash" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "status" : "FAILURE",
          "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210";,
          "triggerID" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "triggerType" : "PUSH"
        }, {
          "hash" : "a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25",
          "status" : "UNKNOWN",
          "url" : "TBD",
          "triggerID" : "a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25",
          "triggerType" : "PUSH"
        } ]
      }-->
      ## CI report:
      
      * b845e34d11e4e44e2b41e2089349baddc3a10b80 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210)
 
      * a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25 UNKNOWN
      
      <details>
      <summary>Bot commands</summary>
        @hudi-bot supports the following commands:
      
       - `@hudi-bot run travis` re-run the last Travis build
       - `@hudi-bot run azure` re-run the last Azure build
      </details>
   
   
   -- 
   This is an automated message from the Apache Git Service.
   To respond to the message, please log on to GitHub and use the
   URL above to go to the specific comment.
   
   To unsubscribe, e-mail: [email protected]
   
   For queries about this service, please contact Infrastructure at:
   [email protected]
   ;;;
   
   ---
   
   11/Aug/21 22:28;githubbot;hudi-bot edited a comment on pull request #2210:
   URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641
   
   
      <!--
      Meta data
      {
        "version" : 1,
        "metaDataEntries" : [ {
          "hash" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "status" : "FAILURE",
          "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210";,
          "triggerID" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "triggerType" : "PUSH"
        }, {
          "hash" : "a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25",
          "status" : "PENDING",
          "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668";,
          "triggerID" : "a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25",
          "triggerType" : "PUSH"
        } ]
      }-->
      ## CI report:
      
      * b845e34d11e4e44e2b41e2089349baddc3a10b80 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210)
 
      * a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668)
 
      
      <details>
      <summary>Bot commands</summary>
        @hudi-bot supports the following commands:
      
       - `@hudi-bot run travis` re-run the last Travis build
       - `@hudi-bot run azure` re-run the last Azure build
      </details>
   
   
   -- 
   This is an automated message from the Apache Git Service.
   To respond to the message, please log on to GitHub and use the
   URL above to go to the specific comment.
   
   To unsubscribe, e-mail: [email protected]
   
   For queries about this service, please contact Infrastructure at:
   [email protected]
   ;;;
   
   ---
   
   11/Aug/21 23:27;githubbot;hudi-bot edited a comment on pull request #2210:
   URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641
   
   
      <!--
      Meta data
      {
        "version" : 1,
        "metaDataEntries" : [ {
          "hash" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "status" : "DELETED",
          "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210";,
          "triggerID" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
          "triggerType" : "PUSH"
        }, {
          "hash" : "a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25",
          "status" : "FAILURE",
          "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668";,
          "triggerID" : "a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25",
          "triggerType" : "PUSH"
        } ]
      }-->
      ## CI report:
      
      * a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668)
 
      
      <details>
      <summary>Bot commands</summary>
        @hudi-bot supports the following commands:
      
       - `@hudi-bot run travis` re-run the last Travis build
       - `@hudi-bot run azure` re-run the last Azure build
      </details>
   
   
   -- 
   This is an automated message from the Apache Git Service.
   To respond to the message, please log on to GitHub and use the
   URL above to go to the specific comment.
   
   To unsubscribe, e-mail: [email protected]
   
   For queries about this service, please contact Infrastructure at:
   [email protected]
   ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to