[ https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856461#comment-13856461 ]
Mikhail Khludnev commented on SOLR-5244: ---------------------------------------- bq. Does it cause any issues with the normal response writer flow? I don't think so. it hits dedicated handlers. So, it's well separated from regular flow. bq. More testing of this feature shows i wonder if you can post numbers and profiler stacktrace. How many fields are dumped in your test case? I have one thought: _BinaryDocValuesImpl.get(int, BytesRef)_ hits _docToOffset_ and _bytes_ after that per every given docnum. Asserting that sequential reading is faster than a random one it makes sense to buffer array of offsets and then look through it for reading _bytes_. Also, looping by _binaryFieldWriters_ per every doc seems like a columnar performance killer. bq. I think we can build segment level caches.. can you highlight how it differs from old good FieldCaches (I mean what's produced by FieldCacheImpl.BinaryDocValuesCache) ? bq. I'm shooting to achieve an export rate of 5+ million small records It sounds really ambitious to me. My expectation about average IO rate is 100-200 MB/sec (and I might wrong here). so few millions might hit the ceiling. > Full Search Result Export > ------------------------- > > Key: SOLR-5244 > URL: https://issues.apache.org/jira/browse/SOLR-5244 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 5.0 > Reporter: Joel Bernstein > Priority: Minor > Fix For: 5.0 > > Attachments: SOLR-5244.patch > > > It would be great if Solr could efficiently export entire search result sets > without scoring or ranking documents. This would allow external systems to > perform rapid bulk imports from Solr. It also provides a possible platform > for exporting results to support distributed join scenarios within Solr. > This ticket provides a patch that has two pluggable components: > 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with > document results and does not delegate to ranking collectors. Instead it puts > the BitSet on the request context. > 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints > the entire result as a binary stream. A header is provided at the beginning > of the stream so external clients can self configure. > Note: > These two components will be sufficient for a non-distributed environment. > For distributed export a new Request handler will need to be developed. > After applying the patch and building the dist or example, you can register > the components through the following changes to solrconfig.xml > Register export contrib libraries: > <lib dir="../../../dist/" regex="solr-export-\d.*\.jar" /> > > Register the "export" queryParser with the following line: > > <queryParser name="export" > class="org.apache.solr.export.ExportQParserPlugin"/> > > Register the "xbin" writer: > > <queryResponseWriter name="xbin" > class="org.apache.solr.export.BinaryExportWriter"/> > > The following query will perform the export: > {code} > http://localhost:8983/solr/collection1/select?q=*:*&fq={!export}&wt=xbin&fl=join_i > {code} > Initial patch supports export of four data-types: > 1) Single value trie int, long and float > 2) Binary doc values. > The numerics are currently exported from the FieldCache and the Binary doc > values can be in memory or on disk. > Since this is designed to export very large result sets efficiently, stored > fields are not used for the export. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org