[ 
https://issues.apache.org/jira/browse/LUCENE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006211#comment-13006211
 ] 

Doron Cohen commented on LUCENE-2958:
-------------------------------------

bq. So let's keep the current impl as optimized as it was before, but allow for 
a simple extension point?

I believe the original case in LineDocSource is as optimized as it was before.

If you take a look at the inner class SimpleDocDataFiller - it has exactly the 
same logic as before.

The more general logic - the one in HeaderDocDataFiller which processes any 
header line for you - is more complex, and perhaps somewhat less efficient - 
but only slightly I believe, as the additional cost is a switch statement per 
field.

But please do not review this code just yet - I'm in a middle of improving it:
* By default LineDocSource should use the SimpleDocDataFiller not only when 
there's no header line in the file (this part is covered already), but also 
when the header line is the same as the default one (the default coming from 
WriteLineDocTask).
* selecting the DocDataFiller to use should be possible through a property - as 
is the spirit of this package.
* DocDataFiller should better be named DocDataLineReader.
* DocDataLineReader inner methods like fillDate2() should be inlined (i.e. 
removed)
* HeaderDocDataLineReader should switch on an enum rather than on ints.

These changes would make LineDocSource more efficient and more readable.
I feel that the added functionality is worth the additional complexity in the 
code,
And, for those wishing to save the extra cycles of the general 
HeaderDocDataLineReader, it is possible to implement a custom one and pass its 
name as the (new) property docdata.line.reader.

I am working on an updated patch...

> WriteLineDocTask improvements
> -----------------------------
>
>                 Key: LUCENE-2958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2958
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-2958.patch, LUCENE-2958.patch, LUCENE-2958.patch
>
>
> Make WriteLineDocTask and LineDocSource more flexible/extendable:
> * allow to emit lines also for empty docs (keep current behavior as default)
> * allow more/less/other fields

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to