[
https://issues.apache.org/jira/browse/LUCENE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-2958:
--------------------------------
Attachment: LUCENE-2958.patch
Hi, thanks Mike and Shai for the review and great comments.
Attaching an updated patch.
Now WriteLineDocTask writes the fields as a header line to the result file.
It always does this - perhaps a property to disable the header will be useful
for allowing previous behavior (no header).
There are quite a few involved changes to LineDocSource:
- replaced line.split(SEP) by original recurring search for SEP.
- Method fillDocData(doc,fields[]) was changed to take a line String instead of
the array of fields.
- That method was wrapped in a new interface: DocDataFiller for which there are
now two implementations:
-- SimpleDocDataFiller is used when there is no header line in the input file.
It is implementing the original logic before this change. This allows to
continue using existing line-doc-files which have no header line.
-- HeaderDocDataFiller is used when there exists a header line in the input
file. Its implementation populates both fixed fields and flexible properties of
DocData:
--- At construction of the filler a mapping is created from the field position
in the header line to a setter method of docData. That mapping is not by
reflection, nor by a HashMap - simply an int[] posToM where if posToM[3] = 1,
later, when handling the field no. 3 in the line, the method fillDate3() will
be called, and it will, in turn, call docData.setDate(), through a switch
statement. If there's no mapping to a DocData setter, its properties object
will be populated. So, this is quite general, with some performance overhead,
though less than reflection I think (I did not measure this).
- An extension point for overriding the filler creation is through two new
methods:
-- createDocDataFiller() for the case of no header line
-- createDocDataFiller(String[] header) when a header line is found in the input
- Note that filler creation is done once, when reading the first line of the
input file.
Some tests were fixed to account for the existence (or absence) of a header
line.
I think more tests are required, but you can get the idea how this code will
work.
Bottom line, LineDocSource is more general now, but the code became more
complex.
I have mixed feelings about this - preferring simple code, but the added
functionality is appealing.
> WriteLineDocTask improvements
> -----------------------------
>
> Key: LUCENE-2958
> URL: https://issues.apache.org/jira/browse/LUCENE-2958
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Doron Cohen
> Assignee: Doron Cohen
> Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2958.patch, LUCENE-2958.patch, LUCENE-2958.patch
>
>
> Make WriteLineDocTask and LineDocSource more flexible/extendable:
> * allow to emit lines also for empty docs (keep current behavior as default)
> * allow more/less/other fields
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]