[ 
https://issues.apache.org/jira/browse/SOLR-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908084#comment-13908084
 ] 

Manjunath commented on SOLR-2943:
---------------------------------

Nice feature to have :-). Waiting to use it :-)

> DIHCacheWriter & DIHCacheProcessor (entity processor)
> -----------------------------------------------------
>
>                 Key: SOLR-2943
>                 URL: https://issues.apache.org/jira/browse/SOLR-2943
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-ALPHA
>            Reporter: James Dyer
>            Priority: Minor
>             Fix For: 4.7
>
>         Attachments: SOLR-2943.patch, SOLR-2943.patch, SOLR-2943.patch
>
>
> This is a spin-off of SOLR-2382.
> Currently DIH requires users to retrieve, join and index all data for a full 
> or delta update in one big step.  This issue is to allow us to break this 
> into individual steps.  The idea is to have multiple "data-config.xml" files, 
> some of which retrieve and cache data while others join and index data.  
> This is useful when Solr Records are a conglomeration of several data 
> sources.  With this feature, each data source can be retrieved and cached 
> separately.  Once all data sources have been retrieved, they can be joined 
> and indexed in a final step.  When doing a delta update, only the data 
> sources that change need to have their caches updated (or frequently-changing 
> data can remain un-cached while caching the more static data).  This is 
> particularly useful in light of the fact that Lucene/Solr cannot do a true 
> "update" operation.  DIH Caches also provide a handy way to archive source 
> data for which there is no stable system-of-record.
> Implementation Details:
> - The DIHCacheWriter allows us to write the final (root entity) DIH output to 
> a DIHCache rather than to Solr.  Caches can be created from scratch 
> ("full-update") or existing caches can be modified ("delta-update").
> - The DIHCacheProcessor is an Entity Processor that reads a DIHCache.  This 
> Entity Processor can be used for both Root Entities and Child Entities.  
> Cached data can be read back, joined to other Entities and indexed.
> - Both DIHCacheWriter and DIHCacheProcessor support partitioning.  
> DIHCacheWriter can write to a partitioned cache while DIHCacheProcessor can 
> read back a particular partition.  This can be handy when indexing to 
> multiple shards.
> - This patch is 100% stand-alone from the rest of DIH, so while users can 
> patch and rebuild the DIH .jar file to include these classes, it is 
> unnecessary.  To use this functionality, simply include the code here in the 
> classpath. (ex: in SOLR_HOME/lib)
> - In addition to this patch, a persistent cache implementation is required. 
>   - See SOLR-2948 for a DIH Cache Implementation built on Lucene (no 
> additional dependencies). 
>   - See SOLR-2613 for a DIH Cache Implementation backed with BDB-JE (we use 
> this in Production).
>   - Other Cache Implementations (hopefully) will be developed in the future 
> and become available for general use.
> - This patch includes extensive unit tests.  A MockDIHCache that supports 
> persistence and delta updates facilitates the tests.  Do not attempt to use 
> MockDIHCache for anything other than testing or as a reference for developing 
> your own DIHCache implementations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to