[
https://issues.apache.org/jira/browse/LUCENE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005286#comment-13005286
]
Michael McCandless commented on LUCENE-2958:
--------------------------------------------
bq. I don't think that matters? I.e., LineDocSource returns DocData, it's the
DocMaker which creates the actual Lucene Field and Document instances. So all
LDS needs to know is the name of the field.
OK, that's nice. So a simple string<SEP>string<SEP>... header could define the
field names.
bq. Is that array alloc() really critical?
Probably fairly minor, but, this is a death-by-a-thousand-cuts situation? Ie,
these changes only make our index throughput tests slower; hopefully by a tiny
amount, but it'll add up over time.
bq. Maybe instead of doing the split ourselves, we can have a getDocData(String
line), which will be implemented by default to search for TITLE, DATE and BODY,
using the optimized code, and can be overridden by others to parse line
differently?
I think that's good?
Or, if we do the header idea... then a given usage need not override
getDocData? Like it's generic at that point?
> WriteLineDocTask improvements
> -----------------------------
>
> Key: LUCENE-2958
> URL: https://issues.apache.org/jira/browse/LUCENE-2958
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Doron Cohen
> Assignee: Doron Cohen
> Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2958.patch, LUCENE-2958.patch
>
>
> Make WriteLineDocTask and LineDocSource more flexible/extendable:
> * allow to emit lines also for empty docs (keep current behavior as default)
> * allow more/less/other fields
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]