Re: Drop hadoop1 support in Avro 1.9

2017-07-27 Thread Sean Busbey
+1 sounds great. We can let user@avro know about the plan once we have the
change in place, so there's still time if someone has a compelling reason
staying on Avro 1.7 or 1.8 can't work for hadoop1 folks.

On Thu, Jul 27, 2017 at 12:52 AM, Gabor Szadovszky  wrote:

> Hi All,
>
> It has already appeared in some discussions/code reviews that we might not
> want to support hadoop1 anymore. As it would be a breaking change I would
> suggest removing the hadoop1 support in the next major release 1.9.0.
> What do you think?
>
> Cheers,
> Gabor
>



-- 
busbey


[jira] [Commented] (AVRO-2058) ReflectData#isNonStringMap returns true for Utf8 keys

2017-07-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103636#comment-16103636
 ] 

ASF GitHub Bot commented on AVRO-2058:
--

GitHub user samschlegel opened a pull request:

https://github.com/apache/avro/pull/237

[AVRO-2058] ReflectData#isNonStringMap returns true for Utf8 keys

https://issues.apache.org/jira/browse/AVRO-2058

Since `Utf8` does not have an `Stringable` notation, and is not in 
`SpecificData#stringableClasses`, `ReflectData#isNonStringMap` returns true. 
This also causes `ReflectData#isArray` to return true for maps with Utf8 keys, 
and thus `GenericData#resolveUnion` fails as well. This ultimately causes 
`ReflectData#write` to fail for schemas that contain a union that contains a 
map, where the data uses Utf8 for strings.

This following test case reproduces the issue:

```java
  @Test public void testUnionWithMapWithUtf8Keys() {
Schema s = new Schema.Parser().parse
  ("[\"null\", {\"type\":\"map\",\"values\":\"float\"}]");
GenericData data = ReflectData.get();
HashMap map = new HashMap();
map.put(new Utf8("foo"), 1.0f);
assertEquals(1, data.resolveUnion(s, map));
  }
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/samschlegel/avro master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/avro/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #237


commit 8852da40e7a0ee58d9b027173a4972e7e71b432d
Author: Sam Schlegel 
Date:   2017-07-27T02:20:51Z

Add failing test

commit 15fbb40944dc15a37569aa6a3d2e93db0e085af0
Author: Sam Schlegel 
Date:   2017-07-27T02:23:21Z

Add Stringable annotation




> ReflectData#isNonStringMap returns true for Utf8 keys
> -
>
> Key: AVRO-2058
> URL: https://issues.apache.org/jira/browse/AVRO-2058
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Sam Schlegel
>Priority: Critical
> Attachments: AVRO-2058.patch
>
>
> Since {{Utf8}} does not have an {{Stringable}} notation, and is not in 
> {{SpecificData#stringableClasses}}, {{ReflectData#isNonStringMap}} returns 
> true. This also causes {{ReflectData#isArray}} to return true for maps with 
> Utf8 keys, and thus {{GenericData#resolveUnion}} fails as well. This 
> ultimately causes {{ReflectData#write}} to fail for schemas that contain a 
> union that contains a map, where the data uses Utf8 for strings.
> This following test case reproduces the issue:
> {code:java}
>   @Test public void testUnionWithMapWithUtf8Keys() {
> Schema s = new Schema.Parser().parse
>   ("[\"null\", {\"type\":\"map\",\"values\":\"float\"}]");
> GenericData data = ReflectData.get();
> HashMap map = new HashMap();
> map.put(new Utf8("foo"), 1.0f);
> assertEquals(1, data.resolveUnion(s, map));
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] avro pull request #237: [AVRO-2058] ReflectData#isNonStringMap returns true ...

2017-07-27 Thread samschlegel
GitHub user samschlegel opened a pull request:

https://github.com/apache/avro/pull/237

[AVRO-2058] ReflectData#isNonStringMap returns true for Utf8 keys

https://issues.apache.org/jira/browse/AVRO-2058

Since `Utf8` does not have an `Stringable` notation, and is not in 
`SpecificData#stringableClasses`, `ReflectData#isNonStringMap` returns true. 
This also causes `ReflectData#isArray` to return true for maps with Utf8 keys, 
and thus `GenericData#resolveUnion` fails as well. This ultimately causes 
`ReflectData#write` to fail for schemas that contain a union that contains a 
map, where the data uses Utf8 for strings.

This following test case reproduces the issue:

```java
  @Test public void testUnionWithMapWithUtf8Keys() {
Schema s = new Schema.Parser().parse
  ("[\"null\", {\"type\":\"map\",\"values\":\"float\"}]");
GenericData data = ReflectData.get();
HashMap map = new HashMap();
map.put(new Utf8("foo"), 1.0f);
assertEquals(1, data.resolveUnion(s, map));
  }
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/samschlegel/avro master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/avro/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #237


commit 8852da40e7a0ee58d9b027173a4972e7e71b432d
Author: Sam Schlegel 
Date:   2017-07-27T02:20:51Z

Add failing test

commit 15fbb40944dc15a37569aa6a3d2e93db0e085af0
Author: Sam Schlegel 
Date:   2017-07-27T02:23:21Z

Add Stringable annotation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Drop hadoop1 support in Avro 1.9

2017-07-27 Thread Ryan Blue
+1

On Thu, Jul 27, 2017 at 8:45 AM, Ben McCann  wrote:

> +1
>
> Get Outlook for Android
>
> 
> From: Nandor Kollar 
> Sent: Thursday, July 27, 2017 1:36:15 AM
> To: dev@avro.apache.org
> Subject: Re: Drop hadoop1 support in Avro 1.9
>
> I also think it is a good idea!
>
> On Thu, Jul 27, 2017 at 10:27 AM, Niels Basjes  wrote:
>
> > Yes. Good idea.
> > +1
> >
> > On Jul 27, 2017 09:52, "Gabor Szadovszky"  wrote:
> >
> > > Hi All,
> > >
> > > It has already appeared in some discussions/code reviews that we might
> > not
> > > want to support hadoop1 anymore. As it would be a breaking change I
> would
> > > suggest removing the hadoop1 support in the next major release 1.9.0.
> > > What do you think?
> > >
> > > Cheers,
> > > Gabor
> > >
> >
>



-- 
Ryan Blue
Software Engineer
Netflix


Re: Drop hadoop1 support in Avro 1.9

2017-07-27 Thread Niels Basjes
Yes. Good idea.
+1

On Jul 27, 2017 09:52, "Gabor Szadovszky"  wrote:

> Hi All,
>
> It has already appeared in some discussions/code reviews that we might not
> want to support hadoop1 anymore. As it would be a breaking change I would
> suggest removing the hadoop1 support in the next major release 1.9.0.
> What do you think?
>
> Cheers,
> Gabor
>


[jira] [Commented] (AVRO-2055) Remove Magic Value From org.apache.avro.hadoop.io.AvroSequenceFile

2017-07-27 Thread Gabor Szadovszky (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102915#comment-16102915
 ] 

Gabor Szadovszky commented on AVRO-2055:


[~belugabehr], sorry I've just realised during doing a build with your patches 
that we are in the same situation than with AVRO-2053. It does not even compile 
as we are using hadoop-1 by default. So, let's wait if we can get rid of Hadoop 
1 in 1.9.

> Remove Magic Value From org.apache.avro.hadoop.io.AvroSequenceFile
> --
>
> Key: AVRO-2055
> URL: https://issues.apache.org/jira/browse/AVRO-2055
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.7, 1.8.2
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: AVRO-2055.1.patch
>
>
> Remove magic string _io.file.buffer.size_ and _DEFAULT_BUFFER_SIZE_BYTES_ and 
> instead rely on the Hadoop libraries to provide this information.  Will help 
> to keep Avro in sync with changes in Hadoop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Drop hadoop1 support in Avro 1.9

2017-07-27 Thread Gabor Szadovszky
Hi All,

It has already appeared in some discussions/code reviews that we might not
want to support hadoop1 anymore. As it would be a breaking change I would
suggest removing the hadoop1 support in the next major release 1.9.0.
What do you think?

Cheers,
Gabor


[jira] [Commented] (AVRO-2056) DirectBinaryEncoder Creates Buffer For Each Call To writeDouble

2017-07-27 Thread Gabor Szadovszky (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102845#comment-16102845
 ] 

Gabor Szadovszky commented on AVRO-2056:


[~belugabehr], I've took a look on Perf and I see how it works now. So, there 
is no relevant change there, it is more for testing your stuff offline like in 
an IDE...
+1

> DirectBinaryEncoder Creates Buffer For Each Call To writeDouble
> ---
>
> Key: AVRO-2056
> URL: https://issues.apache.org/jira/browse/AVRO-2056
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.7, 1.8.2
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: AVRO-2056.1.patch
>
>
> Each call to {{writeDouble}} creates a new buffer and promptly throws it away 
> even though the class has a re-usable buffer and is used in other methods 
> such as {{writeFloat}}.  Remove this extra buffer.
> {code:title=org.apache.avro.io.DirectBinaryEncoder}
>   // the buffer is used for writing floats, doubles, and large longs.
>   private final byte[] buf = new byte[12];
>   @Override
>   public void writeFloat(float f) throws IOException {
> int len = BinaryData.encodeFloat(f, buf, 0);
> out.write(buf, 0, len);
>   }
>   @Override
>   public void writeDouble(double d) throws IOException {
> byte[] buf = new byte[8];
> int len = BinaryData.encodeDouble(d, buf, 0);
> out.write(buf, 0, len);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2049) Remove Superfluous Configuration From AvroSerializer

2017-07-27 Thread Gabor Szadovszky (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102841#comment-16102841
 ] 

Gabor Szadovszky commented on AVRO-2049:


+1 will commit tomorrow if no more comments

[~nkollar], I did some research related to the usage of 
{{Serializer.open(OutputStream)}} and it seems they are not really reused. We 
might set the previous {{encoder}} to 
{{EncoderFactory.binaryEncoder(OutputStream, BinaryEncoder)}} to reuse, though. 
However, it would recommend having another JIRA for that.

> Remove Superfluous Configuration From AvroSerializer
> 
>
> Key: AVRO-2049
> URL: https://issues.apache.org/jira/browse/AVRO-2049
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.7, 1.8.2
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: AVRO-2049.1.patch, AVRO-2049.2.patch, AVRO-2049.3.patch
>
>
> In the class {{org.apache.avro.hadoop.io.AvroSerializer}}, we see that the 
> Avro block size is configured with a hard-coded value and there is a request 
> to benchmark different buffer sizes.
> {code:title=org.apache.avro.hadoop.io.AvroSerializer}
>   /**
>* The block size for the Avro encoder.
>*
>* This number was copied from the AvroSerialization of 
> org.apache.avro.mapred in Avro 1.5.1.
>*
>* TODO(gwu): Do some benchmarking with different numbers here to see if it 
> is important.
>*/
>   private static final int AVRO_ENCODER_BLOCK_SIZE_BYTES = 512;
>   /** An factory for creating Avro datum encoders. */
>   private static EncoderFactory mEncoderFactory
>   = new 
> EncoderFactory().configureBlockSize(AVRO_ENCODER_BLOCK_SIZE_BYTES);
> {code}
> However, there is no need to benchmark, this setting is superfluous and is 
> ignored with the current implementation.
> {code:title=org.apache.avro.hadoop.io.AvroSerializer}
>   @Override
>   public void open(OutputStream outputStream) throws IOException {
> mOutputStream = outputStream;
> mAvroEncoder = mEncoderFactory.binaryEncoder(outputStream, mAvroEncoder);
>   }
> {code}
> {{org.apache.avro.io.EncoderFactory.binaryEncoder}} ignores this setting.  
> This setting is only relevant for calls to 
> {{org.apache.avro.io.EncoderFactory.blockingBinaryEncoder}} 
>  which considers the configured "Block Size" for doing binary encoding of 
> blocked Array types as laid out in the 
> [specs|https://avro.apache.org/docs/1.8.2/spec.html#binary_encode_complex].  
> It can simply be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2048) Avro Binary Decoding - Gracefully Handle Long Strings

2017-07-27 Thread Gabor Szadovszky (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102826#comment-16102826
 ] 

Gabor Szadovszky commented on AVRO-2048:


Thanks, [~belugabehr], I'll wait another day for additional comments before 
committing.
+1

> Avro Binary Decoding - Gracefully Handle Long Strings
> -
>
> Key: AVRO-2048
> URL: https://issues.apache.org/jira/browse/AVRO-2048
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.7, 1.8.2
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: AVRO-2048.1.patch, AVRO-2048.2.patch, AVRO-2048.3.patch
>
>
> According to the 
> [specs|https://avro.apache.org/docs/1.8.2/spec.html#binary_encode_primitive]:
> bq. a string is encoded as a *long* followed by that many bytes of UTF-8 
> encoded character data.
> However, that is currently not being adhered to:
> {code:title=org.apache.avro.io.BinaryDecoder}
>   @Override
>   public Utf8 readString(Utf8 old) throws IOException {
> int length = readInt();
> Utf8 result = (old != null ? old : new Utf8());
> result.setByteLength(length);
> if (0 != length) {
>   doReadBytes(result.getBytes(), 0, length);
> }
> return result;
>   }
> {code}
> The first thing the code does here is to load an *int* value, not a *long*.  
> Because of the variable length nature of the size, this will mostly work.  
> However, there may be edge-cases where the serializer is putting in large 
> length values erroneously or nefariously. Let us gracefully detect such 
> scenarios and more closely adhere to the spec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)