[ https://issues.apache.org/jira/browse/SOLR-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589495#comment-14589495 ]
Noble Paul commented on SOLR-7188: ---------------------------------- With the schema API in place we can really get the schema information through a REST API. > Run Data Import Handler processes in a SolrJ client > --------------------------------------------------- > > Key: SOLR-7188 > URL: https://issues.apache.org/jira/browse/SOLR-7188 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler > Reporter: Ted Sullivan > Priority: Minor > Attachments: IDEA-AS-CODE.patch, SOLR-7188.patch, SOLR-7188.patch, > SOLR-7188.patch > > > Adds a DataImportHandlerClient class that wraps an EmbeddedSolrServer and > adds a DIHCloudWriter implementation of DIHWriter that sends documents to a > remote SolrCloud cluster. This enables existing DIH processes to run outside > of the Solr JVM which should enable better scalability. > The current architecture of DIH imposes several restrictions on scalability. > First, the DIH runs in the same process space as Solr itself and competes for > resources (CPU and memory) with normal Solr processes devoted to indexing and > querying. Second, the DIH cannot be multi-threaded which means that > parallelizing it requires splitting the processing amongst nodes in a > SolrCloud cluster. Since the incoming data is sent through an > UpdateRequestProcessor chain (via the SolrWriter implementation of > DIHWriter), additional routing is done internally as the documents are > forwarded to the current shard leader nodes once the ID hash is computed. > This causes additional network traffic within the SolrCloud cluster. Scaling > the DIH is limited by the number of nodes in the cluster and any heavy-duty > processing due to entity processors or transformation elements shares the > processing resources of Solr itself. This is known to be a source of > bottlenecks in Solr installations (SolrCloud or Master-Slave) that use DIH. > The DataImportHandlerClient uses native DIH functionality - DataImporter, > etc. but can be run externally to Solr. This means that as many processes as > are needed to achieve necessary performance at scale can be added and the > processing that occurs within the DataImportHandler is done outside of the > Solr JVM. The same benefits that accrue with multiple SolrJ clients can now > be realized with DIH without the necessity of porting code from DIH to a > SolrJ client. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org