[
https://issues.apache.org/jira/browse/PHOENIX-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352669#comment-14352669
]
Gabriel Reid commented on PHOENIX-1711:
---------------------------------------
FWIW, my take on this topic in general is that the numbers are pretty much in
line with what I would expect as far as where the work is being done (i.e. 18%
of the time spent in parsing the input, and 39% of the time spent converting
into Phoenix encoding). Seeing as those two tasks are the only real
functionality performed by this tool, I think it's to be expected that they're
taking up ~60% of the execution time. That being said, obviously making things
faster is a good thing (as long as it doesn't come at the cost of breaking
things).
Looking at the patch, I saw the following in
{{org.apache.phoenix.mapreduce.CsvToKeyValueMapper#setup}}
{code}
try {
csvUpsertExecutor = buildUpsertExecutor(conf);
} catch (SQLException e) {
e.printStackTrace();
}
{code}
We definitely want to throw that exception up the stack there and not just
print the stack trace, as otherwise this is just going to lead to a NPE later.
I almost had the feeling that this patch is the combination of a couple of
patches, could that be? Or are all the changes in there necessary? For example,
is the change in PArrayDataType intended to be in this patch?
Also, considering that the optimization in this change is about speeding up the
following (pseudo-code) calling pattern:
{code}
for listOfValues in input:
for value in listOfValues:
preparedStatement.setObject(value)
preparedStatement.execute()
{code}
would it be apply this fix so that users of the public APIs will also take
advantage of it? I can imagine that there are a lot of realtime ingest use
cases where the same prepared statement is just being used over and over to
ingest data, so I think it would be good if we can minimize the work being done
in (re-)compiling the statement every time there as well.
> Improve performance of CSV loader
> ---------------------------------
>
> Key: PHOENIX-1711
> URL: https://issues.apache.org/jira/browse/PHOENIX-1711
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Attachments: PHOENIX-1711.patch
>
>
> Here is a break-up of percentage execution time for some of the steps inthe
> mapper:
> csvParser: 18%
> csvUpsertExecutor.execute(ImmutableList.of(csvRecord)): 39%
> PhoenixRuntime.getUncommittedDataIterator(conn, true): 9%
> while (uncommittedDataIterator.hasNext()): 15%
> Read IO & custom processing: 19%
> See details here: http://s.apache.org/6rl
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)