tuteng commented on a change in pull request #5240: [Doc] Add *File source 
connector guide*
URL: https://github.com/apache/pulsar/pull/5240#discussion_r327393385
 
 

 ##########
 File path: site2/docs/io-file-source.md
 ##########
 @@ -0,0 +1,69 @@
+---
+id: io-file
+title: File source connector
+sidebar_label: File source connector
+---
+
+The File source connector pulls messages from files in directories and 
persists the messages to Pulsar topics.
+
+## Configuration
+
+The configuration of the File source connector has the following properties.
+
+### Property
+
+| Name | Type|Required | Default | Description 
+|------|----------|----------|---------|-------------|
+| `inputDirectory` | String|false|  | The input directory to pull files. |
+| `recurse` | Boolean|false |  | Whether to pull files from subdirectory or 
not.|
+| `keepFile` |Boolean|false | false | If set to true, the file is not deleted 
after it is processed, which means the file can be picked up continually. |
+| `fileFilter` | String|false| [^\\.].* | The file whose name matches the 
given regular expression is picked up. |
+| `pathFilter` | String |false |  | If `recurse` is set to true, the 
subdirectory whose path matches the given regular expression is scanned. |
+| `minimumFileAge` | Integer|false |  | The minimum age that a file can be 
processed. <br><br>Any file younger than `minimumFileAge` (according to the 
last modification date) is ignored. |
+| `maximumFileAge` | Long|false | | The maximum age that a file can be 
processed. <br><br>Any file older than `maximumFileAge` (according to last 
modification date) is ignored. |
+| `minimumSize` |Integer| false | | The minimum size (in bytes) that a file 
can be processed. |
+| `maximumSize` | Double|false || The maximum size (in bytes) that a file can 
be processed. |
+| `ignoreHiddenFiles` |Boolean| false | | Whether the hidden files should be 
ignored or not. |
+| `pollingInterval`|Long | false |  | Indicates how long to wait before 
performing a directory listing. |
+| `numWorkers` | Integer | false | 1 | The number of worker threads that 
process files.<br><br> This allows you to process a larger number of files 
concurrently. <br><br>However, setting this to a value greater than 1 makes the 
data from multiple files mixed in the target topic. |
+
 
 Review comment:
   The option `inputDirectory` is required.   
   
   The option `recurse` default value is `true`    
   
   The option `pollingInterval` default value is `10000L`  
   
   The option `minimumSize ` default value is `1`    
   
   The options `maximumSize` default value is `Double.MAX_VALUE `  
     
   The option `minimumFileAge` default value is `0`   
   
   The option `maximumFileAge` default value is `Long.MAX_VALUE`    
   
   The option `pathFilter` default value is `null`     
   
   reference 
https://github.com/apache/pulsar/blob/master/pulsar-io/file/src/main/java/org/apache/pulsar/io/file/FileListingThread.java

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to