[ 
https://issues.apache.org/jira/browse/LUCENE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005286#comment-13005286
 ] 

Michael McCandless commented on LUCENE-2958:
--------------------------------------------

bq. I don't think that matters? I.e., LineDocSource returns DocData, it's the 
DocMaker which creates the actual Lucene Field and Document instances. So all 
LDS needs to know is the name of the field.

OK, that's nice.  So a simple string<SEP>string<SEP>... header could define the 
field names.

bq. Is that array alloc() really critical?

Probably fairly minor, but, this is a death-by-a-thousand-cuts situation?  Ie, 
these changes only make our index throughput tests slower; hopefully by a tiny 
amount, but it'll add up over time.

bq. Maybe instead of doing the split ourselves, we can have a getDocData(String 
line), which will be implemented by default to search for TITLE, DATE and BODY, 
using the optimized code, and can be overridden by others to parse line 
differently?

I think that's good?

Or, if we do the header idea... then a given usage need not override 
getDocData?  Like it's generic at that point?

> WriteLineDocTask improvements
> -----------------------------
>
>                 Key: LUCENE-2958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2958
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-2958.patch, LUCENE-2958.patch
>
>
> Make WriteLineDocTask and LineDocSource more flexible/extendable:
> * allow to emit lines also for empty docs (keep current behavior as default)
> * allow more/less/other fields

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to