[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080537#comment-15080537 ]
Joel Bernstein commented on SOLR-7535: -------------------------------------- After some more thought, I'm thinking of adding a buffer="true/false" parameter to the UpdateStream. If buffer="true" then the UpdateStream will first write each batch to local disk. During the buffering phase each tuple with return the "buffered" count. When all the records have been buffered, each call to read() will index one batch from disk and return the "indexed" count. I believe we're going to need this buffering approach when indexing large amounts of data from a large number of shards. For example with 10 workers and 20 shards with 3 replicas we could expect well over 10 million records per second being exported from the shards. Indexing will be much, much slower so the exporting shards will be blocked for minutes at time causing timeouts. Buffering to local disk should be able to keep up, even with compression. If buffer="false" then the UpdateStream will directly update the way that it does now. This will work fine for smaller loads. > Add UpdateStream to Streaming API and Streaming Expression > ---------------------------------------------------------- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ > Reporter: Joel Bernstein > Assignee: Joel Bernstein > Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org