[ https://issues.apache.org/jira/browse/SOLR-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719527#comment-13719527 ]
Otis Gospodnetic commented on SOLR-5075: ---------------------------------------- [~r...@wmds.ro] you should close this issue and ask on the solr-user mailing list. > SolrCloud commit process is too time consuming, even if documents are light > --------------------------------------------------------------------------- > > Key: SOLR-5075 > URL: https://issues.apache.org/jira/browse/SOLR-5075 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis, SolrCloud > Affects Versions: 4.1 > Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom > java importer. > Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb > SSD and 50tb SAS memory > Reporter: Radu Ghita > Labels: import, solrconfig.xml > > We are having a client with business model that requires indexing each month > billion rows into solr from mysql in a small time-frame. The documents are > very light, but the number is very high and we need to achieve speeds of > around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after > some hours ( ~12 ) it crashes and the speed slows down as hours go by. > Therefore we have developed a custom java importer that connects directly to > mysql and solrcloud via zookeeper, grabs data from mysql, creates documents > and then imports into solr. This helps because we are opening ~50 threads and > the indexing process speeds up. We have optimized the mysql queries ( mysql > was the initial bottleneck ) and the speeds we get now are over 100k/s, but > as index number gets bigger, solr stays very long on adding documents. I > assume it needs to be something from solrconfig that makes solr stay and even > block after 100 mil documents indexed. > Here is the java code that creates documents and then adds to solr server: > public void createDocuments() throws SQLException, SolrServerException, > IOException > { > App.logger.write("Creating documents.."); > this.docs = new ArrayList<SolrInputDocument>(); > App.logger.incrementNumberOfRows(this.size); > while(this.results.next()) > { > > this.docs.add(this.getDocumentFromResultSet(this.results)); > } > this.statement.close(); > this.results.close(); > } > > public void commitDocuments() throws SolrServerException, IOException > { > App.logger.write("Committing.."); > App.solrServer.add(this.docs); // here it stays very long and > then blocks > App.logger.incrementNumberOfRows(this.docs.size()); > this.docs.clear(); > } > I am also pasting solrconfig.xml parameters that make sense to this > discussion: > <maxIndexingThreads>128</maxIndexingThreads> > <useCompoundFile>false</useCompoundFile> > <ramBufferSizeMB>10000</ramBufferSizeMB> > <maxBufferedDocs>1000000</maxBufferedDocs> > <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> > <int name="maxMergeAtOnce">20000</int> > <int name="segmentsPerTier">1000000</int> > <int name="maxMergeAtOnceExplicit">10000</int> > </mergePolicy> > <mergeFactor>100</mergeFactor> > <termIndexInterval>1024</termIndexInterval> > <autoCommit> > <maxTime>15000</maxTime> > <maxDocs>1000000</maxDocs> > <openSearcher>false</openSearcher> > </autoCommit> > <autoSoftCommit> > <maxTime>2000000</maxTime> > </autoSoftCommit> > Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. > If there's any other info needed please let me know. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org