Kasper Sørensen created METAMODEL-31:
----------------------------------------
Summary: CSV module: Ineffecient insert/append implementation for
non-FileResources
Key: METAMODEL-31
URL: https://issues.apache.org/jira/browse/METAMODEL-31
Project: Metamodel
Issue Type: Bug
Affects Versions: 4.0
Reporter: Kasper Sørensen
I recently noticed a very poor performance of inserting records into a CSV
resource which was a virtual file of a third party system. We have our own
implementation of the Resource interface for this virtual file type. The
Resource implementation itself is effective enough, so I was not sure why it
took the additional time as compared to inserting (appending) data into the
resource.
The answer is in the CsvUpdateCallback class, line 101 and onwards:
{code}
// generic handling for any kind of resource
final Action<OutputStream> action = new Action<OutputStream>() {
@Override
public void run(OutputStream out) throws Exception {
final String encoding = _configuration.getEncoding();
final OutputStreamWriter writer = new
OutputStreamWriter(out, encoding);
writer.write(line);
writer.flush();
}
};
if (append) {
_resource.append(action);
} else {
_resource.write(action);
}
{code}
It seems that there is a if-block specifically for FileResources. For
FileResource's a trick is applied so that the FileOutputStream is reused for
each inserted record.
The trouble is that for other types of Resources, the above method is used -
request a separate append operation for each record. This involves typically
opening and closing the output stream. When this happens for EACH record, it
comes at a severe penanlty in the end.
A suggested fix for this would be to have a buffer of inserts in memory. Every
time the buffer would hit say 1000 records, it would be flushed in a single
append operation. This would dramatically improve the overall performance.
--
This message was sent by Atlassian JIRA
(v6.1#6144)