[
https://issues.apache.org/jira/browse/PIG-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550206
]
Olga Natkovich commented on PIG-39:
-----------------------------------
I incorporated the change and ran performance tests. Unfortunately, I did not
see any change in performance. By looking at Hadoop, code, I think they already
buffering the data, so our code just going against data cached in memory.
I am still going to commit the patch since this is a bug.
> BufferedPositionedInputStream drastically reduces read performance because it
> doesn't override read([], o, l) in InputStream
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-39
> URL: https://issues.apache.org/jira/browse/PIG-39
> Project: Pig
> Issue Type: Bug
> Components: impl
> Environment: Java 1.6, Mac OS X 10.5
> Reporter: Sam Pullara
>
> Simple fix can have a huge effect on performance of certain kinds of PIG
> programs:
> Index: src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
> ===================================================================
> --- src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
> (revision 597597)
> +++ src/org/apache/pig/impl/io/BufferedPositionedInputStream.java
> (working copy)
> @@ -49,7 +49,14 @@
> pos += rc;
> return rc;
> }
> -
> +
> + @Override
> + public int read(byte b[], int off, int len) throws IOException {
> + int read = in.read(b, off, len);
> + pos += read;
> + return read;
> + }
> +
> /**
> * Returns the current position in the tracked InputStream.
> */
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.