Github user bhupeshchawda commented on a diff in the pull request:

    https://github.com/apache/apex-malhar/pull/353#discussion_r72930965
  
    --- Diff: docs/operators/enricher.md ---
    @@ -0,0 +1,169 @@
    +POJO Enricher
    +=============
    +
    +## Operator Objective
    +This operator receives an POJO ([Plain Old Java 
Object](https://en.wikipedia.org/wiki/Plain_Old_Java_Object)) as an incoming 
tuple and uses an external source to enrich the data in 
    +the incoming tuple and finally emits the enriched data as a new enriched 
POJO.
    +
    +POJOEnricher supports enrichment from following external sources:
    +
    +1. **JSON File Based** - Reads the file in memory having content stored in 
JSON format and use that to enrich the data. This can be done using FSLoader 
implementation.
    +2. **JDBC Based** - Any JDBC store can act as an external entity to which 
enricher can request data for enriching incoming tuples. This can be done using 
JDBCLoader implementation.
    +
    +POJO Enricher does not hold any state and is **idempotent**, 
**fault-tolerance** and **statically/dynamically partitionable**.
    +
    +## Operator Usecase
    +1. Bank ***transaction records*** usually contains customerId. For further 
analysis of transaction one wants the customer name and other customer related 
information. 
    +Such information is present in another database. One could enrich the 
transaction's record with customer information using POJOEnricher.
    +2. ***Call Data Record (CDR)*** contains only mobile/telephone numbers of 
the customer. Customer information is missing in CDR. POJO Enricher can be used 
to enricher 
    +CDR with customer data for further analysis.
    +
    +## Operator Information
    +1. Operator location: ***malhar-contrib***
    +2. Available since: ***3.4.0***
    +3. Operator state: ***Evolving***
    +3. Java Packages:
    +    * Operator: 
***[com.datatorrent.contrib.enrich.POJOEnricher](https://www.datatorrent.com/docs/apidocs/com/datatorrent/contrib/enrich/POJOEnricher.html)***
    +    * FSLoader: 
***[com.datatorrent.contrib.enrich.FSLoader](https://www.datatorrent.com/docs/apidocs/com/datatorrent/contrib/enrich/FSLoader.html)***
    +    * JDBCLoader: 
***[com.datatorrent.contrib.enrich.JDBCLoader](https://www.datatorrent.com/docs/apidocs/com/datatorrent/contrib/enrich/JDBCLoader.html)***
    +
    +## Properties, Attributes and Ports
    +### <a name="props"></a>Properties of POJOEnricher
    +| **Property** | **Description** | **Type** | **Mandatory** | **Default 
Value** |
    +| -------- | ----------- | ---- | ------------------ | ------------- |
    +| *includeFields* | List of fields from database that needs to be added to 
output POJO. | List<String\> | Yes | N/A |
    +| *lookupFields* | List of fields from input POJO which will form a 
*unique composite* key for querying to database | List<String\> | Yes | N/A |
    +| *store* | Backend Store from which data should be queried for enrichment 
| [BackendStore](#backendStore) | Yes | N/A |
    +| *cacheExpirationInterval* | Cache entry expiry in ms. After this time, 
the lookup to store will be done again for given key | int | No | 1 * 60 * 60 * 
1000 (1 hour) |
    +| *cacheCleanupInterval* | Interval in ms after which cache will be 
removed for any stale entries. | int | No | 1 * 60 * 60 * 1000 (1 hour) |
    +| *cacheSize* | Number of entry in cache after which eviction will start 
on each addition based on LRU | int | No | 1000 |
    +
    +#### <a name="backendStore"></a>Properties of FSLoader (BackendStore)
    +| **Property** | **Description** | **Type** | **Mandatory** | **Default 
Value** |
    +| -------- | ----------- | ---- | ------------------ | ------------- |
    +| *fileName* | Path of the file, the data from which will be used for 
enrichment. See [here](#JSONFileFormat) for JSON File format. | String | Yes | 
N/A |
    +
    +
    +#### Properties of JDBCLoader (BackendStore)
    +| **Property** | **Description** | **Type** | **Mandatory** | **Default 
Value** |
    +| -------- | ----------- | ---- | ------------------ | ------------- |
    +| *databaseUrl* | Connection string for connecting to JDBC | String | Yes 
| N/A |
    +| *databaseDriver* | JDBC Driver class for connection to JDBC Store. This 
driver should be there in classpath | String | Yes | N/A |
    +| *tableName* | Name of the table from which data needs to be retrieved | 
String | Yes | N/A |
    +| *connectionProperties* | Command seperated list of advanced connection 
properties that need to be passed to JDBC Driver. For eg. 
*prop1:val1,prop2:val2* | String | No | null |
    +| *queryStmt* | Select statement which will be used to query the data. 
This is optional parameter in case of advanced query. | String | No | null |
    +
    +
    +
    +### Platform Attributes that influences operator behavior
    +| **Attribute** | **Description** | **Type** | **Mandatory** |
    +| -------- | ----------- | ---- | ------------------ |
    +| *input.TUPLE_CLASS* | TUPLE_CLASS attribute on input port which tells 
operator the class of POJO which will be incoming | Class or FQCN| Yes |
    +| *output.TUPLE_CLASS* | TUPLE_CLASS attribute on output port which tells 
operator the class of POJO which need to be emitted | Class or FQCN | Yes |
    +
    +
    +### Ports
    +| **Port** | **Description** | **Type** | **Mandatory** |
    +| -------- | ----------- | ---- | ------------------ |
    +| *input* | Tuple which needs to be enriched are received on this port | 
Object (POJO) | Yes |
    +| *output* | Tuples that are enriched from external source are emitted 
from on this port | Object (POJO) | No |
    +
    +## Limitations
    +Current POJOEnricher contains following limitation:
    +
    +1. FSLoader loads the file content in memory. Though it loads only the 
composite key and composite value in memory, a very large amount of data would 
bloat the memory and make the operator go OOM. In case the filesize is large, 
allocate sufficient memory to the POJOEnricher.
    +2. Incoming POJO should be a subset of outgoing POJO.
    --- End diff --
    
    Is this necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to