[jira] [Commented] (AVRO-2056) DirectBinaryEncoder Creates Buffer For Each Call To writeDouble
[ https://issues.apache.org/jira/browse/AVRO-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096959#comment-16096959 ] BELUGA BEHR commented on AVRO-2056: --- With the patch applied, writing Doubles is between 1% and 2% faster and has the added benefit of cutting down on garbage collection {code} # Buffer Removed DoubleWrite: 2762 ms 72.407 579.258 200 DoubleWrite: 2786 ms 71.787 574.293 200 DoubleWrite: 2755 ms 72.570 580.561 200 # Buffer Present DoubleWrite: 2822 ms 70.871 566.965 200 DoubleWrite: 2830 ms 70.667 565.336 200 DoubleWrite: 2807 ms 71.230 569.842 200 {code} > DirectBinaryEncoder Creates Buffer For Each Call To writeDouble > --- > > Key: AVRO-2056 > URL: https://issues.apache.org/jira/browse/AVRO-2056 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: AVRO-2056.1.patch > > > Each call to {{writeDouble}} creates a new buffer and promptly throws it away > even though the class has a re-usable buffer and is used in other methods > such as {{writeFloat}}. Remove this extra buffer. > {code:title=org.apache.avro.io.DirectBinaryEncoder} > // the buffer is used for writing floats, doubles, and large longs. > private final byte[] buf = new byte[12]; > @Override > public void writeFloat(float f) throws IOException { > int len = BinaryData.encodeFloat(f, buf, 0); > out.write(buf, 0, len); > } > @Override > public void writeDouble(double d) throws IOException { > byte[] buf = new byte[8]; > int len = BinaryData.encodeDouble(d, buf, 0); > out.write(buf, 0, len); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AVRO-2040) Fix Ruby 2.4 deprecation notices
[ https://issues.apache.org/jira/browse/AVRO-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suraj Acharya updated AVRO-2040: Fix Version/s: 1.9.0 > Fix Ruby 2.4 deprecation notices > - > > Key: AVRO-2040 > URL: https://issues.apache.org/jira/browse/AVRO-2040 > Project: Avro > Issue Type: Improvement >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Minor > Fix For: 1.9.0 > > > https://github.com/apache/avro/pull/231 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AVRO-2040) Fix Ruby 2.4 deprecation notices
[ https://issues.apache.org/jira/browse/AVRO-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suraj Acharya updated AVRO-2040: Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch [~timperkins] > Fix Ruby 2.4 deprecation notices > - > > Key: AVRO-2040 > URL: https://issues.apache.org/jira/browse/AVRO-2040 > Project: Avro > Issue Type: Improvement >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Minor > Fix For: 1.9.0 > > > https://github.com/apache/avro/pull/231 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-2040) Fix Ruby 2.4 deprecation notices
[ https://issues.apache.org/jira/browse/AVRO-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096912#comment-16096912 ] ASF GitHub Bot commented on AVRO-2040: -- Github user asfgit closed the pull request at: https://github.com/apache/avro/pull/231 > Fix Ruby 2.4 deprecation notices > - > > Key: AVRO-2040 > URL: https://issues.apache.org/jira/browse/AVRO-2040 > Project: Avro > Issue Type: Improvement >Reporter: Tim Perkins >Priority: Minor > > https://github.com/apache/avro/pull/231 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-2040) Fix Ruby 2.4 deprecation notices
[ https://issues.apache.org/jira/browse/AVRO-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096911#comment-16096911 ] ASF subversion and git services commented on AVRO-2040: --- Commit b1233fd6468d4de546891cc12d19f905c55604cc in avro's branch refs/heads/master from [~timperkins] [ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=b1233fd ] AVRO-2040: Ruby 2.4 deprecation fixes This closes #231 Signed-off-by: sacharya > Fix Ruby 2.4 deprecation notices > - > > Key: AVRO-2040 > URL: https://issues.apache.org/jira/browse/AVRO-2040 > Project: Avro > Issue Type: Improvement >Reporter: Tim Perkins >Priority: Minor > > https://github.com/apache/avro/pull/231 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (AVRO-2040) Fix Ruby 2.4 deprecation notices
[ https://issues.apache.org/jira/browse/AVRO-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suraj Acharya reassigned AVRO-2040: --- Assignee: Tim Perkins > Fix Ruby 2.4 deprecation notices > - > > Key: AVRO-2040 > URL: https://issues.apache.org/jira/browse/AVRO-2040 > Project: Avro > Issue Type: Improvement >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Minor > > https://github.com/apache/avro/pull/231 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] avro pull request #231: AVRO-2040: Ruby 2.4 deprecation fixes
Github user asfgit closed the pull request at: https://github.com/apache/avro/pull/231 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (AVRO-2040) Fix Ruby 2.4 deprecation notices
[ https://issues.apache.org/jira/browse/AVRO-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096555#comment-16096555 ] Sean Busbey commented on AVRO-2040: --- +1 > Fix Ruby 2.4 deprecation notices > - > > Key: AVRO-2040 > URL: https://issues.apache.org/jira/browse/AVRO-2040 > Project: Avro > Issue Type: Improvement >Reporter: Tim Perkins >Priority: Minor > > https://github.com/apache/avro/pull/231 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-2041) set up gitbox integration
[ https://issues.apache.org/jira/browse/AVRO-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096395#comment-16096395 ] Ryan Blue commented on AVRO-2041: - Done: https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-14667 > set up gitbox integration > - > > Key: AVRO-2041 > URL: https://issues.apache.org/jira/browse/AVRO-2041 > Project: Avro > Issue Type: Task > Components: community >Reporter: Sean Busbey >Assignee: Sean Busbey > > We got consensus back in may about [turning on gitbox > integration|https://lists.apache.org/thread.html/cdd8ba14c1bf8aca2d71d09862e9780f2dc46af414ed78b1e3fd9c56@%3Cdev.avro.apache.org%3E] > so do it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AVRO-2048) Avro Binary Decoding - Gracefully Handle Long Strings
[ https://issues.apache.org/jira/browse/AVRO-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated AVRO-2048: -- Attachment: AVRO-2048.2.patch Fixed checkstyle failures > Avro Binary Decoding - Gracefully Handle Long Strings > - > > Key: AVRO-2048 > URL: https://issues.apache.org/jira/browse/AVRO-2048 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: AVRO-2048.1.patch, AVRO-2048.2.patch > > > According to the > [specs|https://avro.apache.org/docs/1.8.2/spec.html#binary_encode_primitive]: > bq. a string is encoded as a *long* followed by that many bytes of UTF-8 > encoded character data. > However, that is currently not being adhered to: > {code:title=org.apache.avro.io.BinaryDecoder} > @Override > public Utf8 readString(Utf8 old) throws IOException { > int length = readInt(); > Utf8 result = (old != null ? old : new Utf8()); > result.setByteLength(length); > if (0 != length) { > doReadBytes(result.getBytes(), 0, length); > } > return result; > } > {code} > The first thing the code does here is to load an *int* value, not a *long*. > Because of the variable length nature of the size, this will mostly work. > However, there may be edge-cases where the serializer is putting in large > length values erroneously or nefariously. Let us gracefully detect such > scenarios and more closely adhere to the spec. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AVRO-2054) Use StringBuilder instead of StringBuffer
[ https://issues.apache.org/jira/browse/AVRO-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated AVRO-2054: -- Attachment: AVRO-2054.2.patch Renamed the automatic variable from _buffer_ to _builder_. > Use StringBuilder instead of StringBuffer > - > > Key: AVRO-2054 > URL: https://issues.apache.org/jira/browse/AVRO-2054 > Project: Avro > Issue Type: Improvement >Affects Versions: 1.7.7, 1.8.2 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: AVRO-2054.1.patch, AVRO-2054.2.patch > > > Use the un-synchronized StringBuilder instead of StringBuffer. Use _char_ > values instead of Strings. > {code:title=org.apache.trevni.MetaData} > @Override public String toString() { > StringBuffer buffer = new StringBuffer(); > buffer.append("{ "); > for (Map.Entry e : entrySet()) { > buffer.append(e.getKey()); > buffer.append("="); > try { > buffer.append(new String(e.getValue(), "ISO-8859-1")); > } catch (java.io.UnsupportedEncodingException error) { > throw new TrevniRuntimeException(error); > } > buffer.append(" "); > } > buffer.append("}"); > return buffer.toString(); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (AVRO-2056) DirectBinaryEncoder Creates Buffer For Each Call To writeDouble
[ https://issues.apache.org/jira/browse/AVRO-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated AVRO-2056: -- Comment: was deleted (was: Renamed the automatic variable from _buffer_ to _builder_.) > DirectBinaryEncoder Creates Buffer For Each Call To writeDouble > --- > > Key: AVRO-2056 > URL: https://issues.apache.org/jira/browse/AVRO-2056 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: AVRO-2056.1.patch > > > Each call to {{writeDouble}} creates a new buffer and promptly throws it away > even though the class has a re-usable buffer and is used in other methods > such as {{writeFloat}}. Remove this extra buffer. > {code:title=org.apache.avro.io.DirectBinaryEncoder} > // the buffer is used for writing floats, doubles, and large longs. > private final byte[] buf = new byte[12]; > @Override > public void writeFloat(float f) throws IOException { > int len = BinaryData.encodeFloat(f, buf, 0); > out.write(buf, 0, len); > } > @Override > public void writeDouble(double d) throws IOException { > byte[] buf = new byte[8]; > int len = BinaryData.encodeDouble(d, buf, 0); > out.write(buf, 0, len); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AVRO-2056) DirectBinaryEncoder Creates Buffer For Each Call To writeDouble
[ https://issues.apache.org/jira/browse/AVRO-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated AVRO-2056: -- Attachment: AVRO-2054.2.patch Renamed the automatic variable from _buffer_ to _builder_. > DirectBinaryEncoder Creates Buffer For Each Call To writeDouble > --- > > Key: AVRO-2056 > URL: https://issues.apache.org/jira/browse/AVRO-2056 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: AVRO-2056.1.patch > > > Each call to {{writeDouble}} creates a new buffer and promptly throws it away > even though the class has a re-usable buffer and is used in other methods > such as {{writeFloat}}. Remove this extra buffer. > {code:title=org.apache.avro.io.DirectBinaryEncoder} > // the buffer is used for writing floats, doubles, and large longs. > private final byte[] buf = new byte[12]; > @Override > public void writeFloat(float f) throws IOException { > int len = BinaryData.encodeFloat(f, buf, 0); > out.write(buf, 0, len); > } > @Override > public void writeDouble(double d) throws IOException { > byte[] buf = new byte[8]; > int len = BinaryData.encodeDouble(d, buf, 0); > out.write(buf, 0, len); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AVRO-2056) DirectBinaryEncoder Creates Buffer For Each Call To writeDouble
[ https://issues.apache.org/jira/browse/AVRO-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated AVRO-2056: -- Attachment: (was: AVRO-2054.2.patch) > DirectBinaryEncoder Creates Buffer For Each Call To writeDouble > --- > > Key: AVRO-2056 > URL: https://issues.apache.org/jira/browse/AVRO-2056 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: AVRO-2056.1.patch > > > Each call to {{writeDouble}} creates a new buffer and promptly throws it away > even though the class has a re-usable buffer and is used in other methods > such as {{writeFloat}}. Remove this extra buffer. > {code:title=org.apache.avro.io.DirectBinaryEncoder} > // the buffer is used for writing floats, doubles, and large longs. > private final byte[] buf = new byte[12]; > @Override > public void writeFloat(float f) throws IOException { > int len = BinaryData.encodeFloat(f, buf, 0); > out.write(buf, 0, len); > } > @Override > public void writeDouble(double d) throws IOException { > byte[] buf = new byte[8]; > int len = BinaryData.encodeDouble(d, buf, 0); > out.write(buf, 0, len); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-1720) Add an avro-tool to count records in an avro file
[ https://issues.apache.org/jira/browse/AVRO-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096128#comment-16096128 ] Janosch Woschitz commented on AVRO-1720: Just as a follow-up since there seems to be not much progress on this ticket: for the moment I made a separate binary available which allows efficient and convenient counting of records contained in a single avro file or in a folder containing several avro files. The binaries, documentation and source are available here: https://github.com/jwoschitz/avrocount This project tries to fill this gap (at least) until a similar functionality is provided by avro-tools. Over time there were also several improvements to this project in comparison to the original patch. It would be great if these improvements would also find a way back into the Apache Avro project in the longterm. Until then this project can be used in addition to the currently existing avro-tools. > Add an avro-tool to count records in an avro file > - > > Key: AVRO-1720 > URL: https://issues.apache.org/jira/browse/AVRO-1720 > Project: Avro > Issue Type: New Feature > Components: java >Reporter: Janosch Woschitz >Priority: Minor > Labels: starter > Attachments: AVRO-1720.patch, AVRO-1720-with-extended-unittests.patch > > > If you're dealing with bigger avro files (>100MB) it would be nice to have a > way to quickly count the amount of records contained within that file. > With the current state of avro-tools the only way to achieve this (to my > current knowledge) is to dump the data to json and count the amount of > records. For bigger files this might take a while due to the serialization > overhead and since every record needs to be looked at. > I added a new tool which is optimized for counting records, it does not > serialize the records and reads only the block count for each block. > {panel:title=Naive benchmark} > {noformat} > # the input file had a size of ~300MB > $ du -sh sample.avro > 323Msample.avro > # using the new count tool > $ time java -jar avro-tools.jar count sample.avro > 331439 > real0m4.670s > user0m6.167s > sys 0m0.513s > # the current way of counting records > $ time java -jar avro-tools.jar tojson sample.avro | wc > 331439 54904484 1838231743 > real0m52.760s > user1m42.317s > sys 0m3.209s > # the overhead of wc is rather minor > $ time java -jar avro-tools.jar tojson sample.avro > /dev/null > real0m47.834s > user0m53.317s > sys 0m1.194s > {noformat} > {panel} > This tool uses the HDFS API to handle files from any supported filesystem. I > added the unit tests to the already existing TestDataFileTools since it > provided convenient utility functions which I could reuse for my test > scenarios. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AVRO-2043) Move to java8
[ https://issues.apache.org/jira/browse/AVRO-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky updated AVRO-2043: --- Resolution: Fixed Hadoop Flags: Incompatible change Release Note: Avro java source and target levels are now 1.8 Status: Resolved (was: Patch Available) > Move to java8 > - > > Key: AVRO-2043 > URL: https://issues.apache.org/jira/browse/AVRO-2043 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.9.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky >Priority: Blocker > Fix For: 1.9.0 > > > java6 is really old and java7 is already EOD for 2yrs. It is time to move to > java8 in the next major release (1.9.0): > * update source/target to 1.8.0 in the pom.xml > * update the Dockerfile to used jdk8 > * ensure that lang/java builds fine with jdk8 > * ensure that all unit tests pass in lang/java with jdk8 > * ensure that everything builds fine and tests pass by using the new docker > image -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-2043) Move to java8
[ https://issues.apache.org/jira/browse/AVRO-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095967#comment-16095967 ] ASF GitHub Bot commented on AVRO-2043: -- Github user asfgit closed the pull request at: https://github.com/apache/avro/pull/232 > Move to java8 > - > > Key: AVRO-2043 > URL: https://issues.apache.org/jira/browse/AVRO-2043 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.9.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky >Priority: Blocker > Fix For: 1.9.0 > > > java6 is really old and java7 is already EOD for 2yrs. It is time to move to > java8 in the next major release (1.9.0): > * update source/target to 1.8.0 in the pom.xml > * update the Dockerfile to used jdk8 > * ensure that lang/java builds fine with jdk8 > * ensure that all unit tests pass in lang/java with jdk8 > * ensure that everything builds fine and tests pass by using the new docker > image -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] avro pull request #232: AVRO-2043: Move to java8
Github user asfgit closed the pull request at: https://github.com/apache/avro/pull/232 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (AVRO-2043) Move to java8
[ https://issues.apache.org/jira/browse/AVRO-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095966#comment-16095966 ] ASF subversion and git services commented on AVRO-2043: --- Commit f72b620b9d700cf5f2febb0022079b69b9c9577e in avro's branch refs/heads/master from [~gszadovszky] [ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=f72b620 ] AVRO-2043: Move to java8 This closes #232 > Move to java8 > - > > Key: AVRO-2043 > URL: https://issues.apache.org/jira/browse/AVRO-2043 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.9.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky >Priority: Blocker > Fix For: 1.9.0 > > > java6 is really old and java7 is already EOD for 2yrs. It is time to move to > java8 in the next major release (1.9.0): > * update source/target to 1.8.0 in the pom.xml > * update the Dockerfile to used jdk8 > * ensure that lang/java builds fine with jdk8 > * ensure that all unit tests pass in lang/java with jdk8 > * ensure that everything builds fine and tests pass by using the new docker > image -- This message was sent by Atlassian JIRA (v6.4.14#64029)