[jira] [Commented] (SOLR-5075) SolrCloud commit process is too time consuming, even if documents are light

Otis Gospodnetic (JIRA) Thu, 25 Jul 2013 04:38:14 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719527#comment-13719527
 ]


Otis Gospodnetic commented on SOLR-5075:
----------------------------------------

[[email protected]] you should close this issue and ask on the solr-user mailing 
list.
                
> SolrCloud commit process is too time consuming, even if documents are light
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-5075
>                 URL: https://issues.apache.org/jira/browse/SOLR-5075
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis, SolrCloud
>    Affects Versions: 4.1
>         Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom 
> java importer.
> Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb 
> SSD and 50tb SAS memory
>            Reporter: Radu Ghita
>              Labels: import, solrconfig.xml
>
> We are having a client with business model that requires indexing each month 
> billion rows into solr from mysql in a small time-frame. The documents are 
> very light, but the number is very high and we need to achieve speeds of 
> around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after 
> some hours ( ~12 ) it crashes and the speed slows down as hours go by.
> Therefore we have developed a custom java importer that connects directly to 
> mysql and solrcloud via zookeeper, grabs data from mysql, creates documents 
> and then imports into solr. This helps because we are opening ~50 threads and 
> the indexing process speeds up. We have optimized the mysql queries ( mysql 
> was the initial bottleneck ) and the speeds we get now are over 100k/s, but 
> as index number gets bigger, solr stays very long on adding documents. I 
> assume it needs to be something from solrconfig that makes solr stay and even 
> block after 100 mil documents indexed.
> Here is the java code that creates documents and then adds to solr server:
> public void createDocuments() throws SQLException, SolrServerException, 
> IOException
>       {
>               App.logger.write("Creating documents..");
>               this.docs = new ArrayList<SolrInputDocument>();
>               App.logger.incrementNumberOfRows(this.size);
>               while(this.results.next())
>               {
>                          
> this.docs.add(this.getDocumentFromResultSet(this.results));
>               }
>               this.statement.close();
>               this.results.close();
>       }
>       
>       public void commitDocuments() throws SolrServerException, IOException
>       {
>               App.logger.write("Committing..");
>               App.solrServer.add(this.docs); // here it stays very long and 
> then blocks
>               App.logger.incrementNumberOfRows(this.docs.size());
>               this.docs.clear();
>       }
> I am also pasting solrconfig.xml parameters that make sense to this 
> discussion:
> <maxIndexingThreads>128</maxIndexingThreads>
> <useCompoundFile>false</useCompoundFile>
> <ramBufferSizeMB>10000</ramBufferSizeMB>
> <maxBufferedDocs>1000000</maxBufferedDocs>
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>           <int name="maxMergeAtOnce">20000</int>
>           <int name="segmentsPerTier">1000000</int>
>           <int name="maxMergeAtOnceExplicit">10000</int>
> </mergePolicy>
> <mergeFactor>100</mergeFactor>
> <termIndexInterval>1024</termIndexInterval>
> <autoCommit> 
>        <maxTime>15000</maxTime> 
>        <maxDocs>1000000</maxDocs>
>        <openSearcher>false</openSearcher> 
>      </autoCommit>
> <autoSoftCommit> 
>          <maxTime>2000000</maxTime> 
>        </autoSoftCommit>
> Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. 
> If there's any other info needed please let me know.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-5075) SolrCloud commit process is too time consuming, even if documents are light

Reply via email to