[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-09-11 Thread Anthony Dotterer (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161639#comment-16161639
 ] 

Anthony Dotterer commented on SPARK-20958:
--

For those not well versed in sbt shading, the following would pin your project 
at parquet-avro 1.8.1.

In your project/plugins.sbt:
{code}
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")
{code}

In your build.sbt:
{code}
libraryDependencies += ("org.apache.parquet" % "parquet-avro" % 
"1.8.1").intransitive()

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("org.apache.parquet.avro.**" -> "shade.parquet.avro.@1")
.inLibrary("org.apache.parquet" % "parquet-avro" % "1.8.1")
.inProject
)
{code}

> Roll back parquet-mr 1.8.2 to parquet-1.8.1
> ---
>
> Key: SPARK-20958
> URL: https://issues.apache.org/jira/browse/SPARK-20958
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>  Labels: release-notes, release_notes, releasenotes
>
> We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on 
> avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 
> and avro 1.7.7 used by spark-core 2.2.0-rc2.
> Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro 
> (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the 
> reasons mentioned in [PR 
> #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. 
> Therefore, we don't really have many choices here and have to roll back 
> parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-09-08 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158924#comment-16158924
 ] 

Ryan Blue commented on SPARK-20958:
---

[~spiricalsalsaz], you need to only pin parquet-avro, not the other Parquet 
libs. This is caused by a bug in Parquet that has been fixed in 1.8.2, so you 
want the 1.8.2 version of parquet-hadoop, but the 1.8.1 version of 
parquet-avro. Alternatively, you can shade and relocate the version of Avro you 
want and use parquet-avro 1.8.2. That's what I'd recommend.

> Roll back parquet-mr 1.8.2 to parquet-1.8.1
> ---
>
> Key: SPARK-20958
> URL: https://issues.apache.org/jira/browse/SPARK-20958
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>  Labels: release-notes, release_notes, releasenotes
>
> We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on 
> avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 
> and avro 1.7.7 used by spark-core 2.2.0-rc2.
> Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro 
> (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the 
> reasons mentioned in [PR 
> #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. 
> Therefore, we don't really have many choices here and have to roll back 
> parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-09-06 Thread Anthony Dotterer (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155387#comment-16155387
 ] 

Anthony Dotterer commented on SPARK-20958:
--

As a user of Spark 2.2.0 that mixes usage of parquet-avro and avro, here are 
some exceptions that I had.  This will hopefully make search engines find this 
library conflict more quickly for others.

{code}
java.lang.NoClassDefFoundError: org/apache/avro/LogicalType
at 
org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:144)
at 
org.apache.parquet.avro.AvroParquetWriter.access$100(AvroParquetWriter.java:35)
at 
org.apache.parquet.avro.AvroParquetWriter$Builder.getWriteSupport(AvroParquetWriter.java:173)
...
Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
{code}


> Roll back parquet-mr 1.8.2 to parquet-1.8.1
> ---
>
> Key: SPARK-20958
> URL: https://issues.apache.org/jira/browse/SPARK-20958
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>  Labels: release-notes, release_notes, releasenotes
>
> We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on 
> avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 
> and avro 1.7.7 used by spark-core 2.2.0-rc2.
> Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro 
> (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the 
> reasons mentioned in [PR 
> #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. 
> Therefore, we don't really have many choices here and have to roll back 
> parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-08 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043478#comment-16043478
 ] 

Cheng Lian commented on SPARK-20958:


[~marmbrus], here is the draft release note entry:
{quote}
SPARK-20958: For users who use parquet-avro together with Spark 2.2, please use 
parquet-avro 1.8.1 instead of parquet-avro 1.8.2. This is because parquet-avro 
1.8.2 upgrades avro from 1.7.6 to 1.8.1, which is backward incompatible with 
1.7.6.
{quote}

> Roll back parquet-mr 1.8.2 to parquet-1.8.1
> ---
>
> Key: SPARK-20958
> URL: https://issues.apache.org/jira/browse/SPARK-20958
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>  Labels: release-notes, release_notes, releasenotes
>
> We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on 
> avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 
> and avro 1.7.7 used by spark-core 2.2.0-rc2.
> Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro 
> (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the 
> reasons mentioned in [PR 
> #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. 
> Therefore, we don't really have many choices here and have to roll back 
> parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-02 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035149#comment-16035149
 ] 

Cheng Lian commented on SPARK-20958:


Thanks [~rdblue]! I'm also reluctant to roll it back considering those fixes we 
wanted so badly... We decided to give this a try because, from the perspective 
of release management, we'd like to avoid cutting a release with known 
conflicting dependencies, even transitive ones. For a Spark 2.2 user, it's 
quite natural to choose parquet-avro 1.8.2, which is part of parquet-mr 1.8.2, 
which in turn, is a direct dependency of Spark 2.2.0.

However, due to PARQUET-389, rolling back is already not an option. Two options 
I can see here are:

# Release Spark 2.2.0 as is with a statement in the release notes saying that 
users should use parquet-avro 1.8.1 instead of 1.8.2 to avoid the Avro 
compatibility issue.
# Wait for parquet-mr 1.8.3, which hopefully resolves this dependency issue 
(e.g., by reverting PARQUET-358).

> Roll back parquet-mr 1.8.2 to parquet-1.8.1
> ---
>
> Key: SPARK-20958
> URL: https://issues.apache.org/jira/browse/SPARK-20958
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on 
> avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 
> and avro 1.7.7 used by spark-core 2.2.0-rc2.
> Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro 
> (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the 
> reasons mentioned in [PR 
> #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. 
> Therefore, we don't really have many choices here and have to roll back 
> parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-02 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034961#comment-16034961
 ] 

Dongjoon Hyun commented on SPARK-20958:
---

+1 for [~rdblue].

> Roll back parquet-mr 1.8.2 to parquet-1.8.1
> ---
>
> Key: SPARK-20958
> URL: https://issues.apache.org/jira/browse/SPARK-20958
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on 
> avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 
> and avro 1.7.7 used by spark-core 2.2.0-rc2.
> Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro 
> (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the 
> reasons mentioned in [PR 
> #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. 
> Therefore, we don't really have many choices here and have to roll back 
> parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-02 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034950#comment-16034950
 ] 

Ryan Blue commented on SPARK-20958:
---

I don't think it is a good idea to roll back. Spark doesn't depend on 
parquet-avro, where the update to Avro 1.8.1 was made, except for tests where 
it is fine. The backports for Spark in 1.8.2 are worth keeping since there are 
reasonable work-arounds in user projects.

The problem that I've seen on the dev list is when users add parquet-avro to 
their dependencies and the version gets managed to 1.8.2. That will require 
Avro 1.8.1 because parquet-avro calls {{getSchema}} on avro-specific objects. 
But there are a couple reasonable ways to deal with this:

1. Specify a dependency on parquet-avro 1.8.1 that still uses Avro 1.7.x. 
Parquet is backward-compatible with older binaries, so parquet-avro 1.8.1 works 
fine with parquet-hadoop 1.8.2. (This is the recommended work-around.)
2. Shade and relocate Avro 1.8.1 in application Jars, so that Spark can use 
1.7.x and parquet-avro can use 1.8.1.

This was brought up on the dev list, but the user dismissed these work-arounds 
without trying them.

Long-term, we can do a 1.8.3 release to solve this problem, though I think the 
best solution there would be to stop using {{getSchema}} instead of downgrading 
the dependency.

> Roll back parquet-mr 1.8.2 to parquet-1.8.1
> ---
>
> Key: SPARK-20958
> URL: https://issues.apache.org/jira/browse/SPARK-20958
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on 
> avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 
> and avro 1.7.7 used by spark-core 2.2.0-rc2.
> Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro 
> (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the 
> reasons mentioned in [PR 
> #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. 
> Therefore, we don't really have many choices here and have to roll back 
> parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-02 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034310#comment-16034310
 ] 

Cheng Lian commented on SPARK-20958:


[~rdblue] I think the root cause here is we cherry-picked parquet-mr [PR 
#318|https://github.com/apache/parquet-mr/pull/318] to parquet-mr 1.8.2, and 
introduced this avro upgrade.

Tried to roll back parquet-mr back to 1.8.1 but it doesn't work well because 
this brings back 
[PARQUET-389|https://issues.apache.org/jira/browse/PARQUET-389] and breaks some 
test cases involving schema evolution. 

It would be nice if we can have a parquet-mr 1.8.3 or 1.8.2.1 release that has 
[PR #318|https://github.com/apache/parquet-mr/pull/318] reverted from 1.8.2? I 
think cherry-picking that PR is also problematic for parquet-mr because it 
introduces a backward-incompatible dependency change in a maintenance release.

> Roll back parquet-mr 1.8.2 to parquet-1.8.1
> ---
>
> Key: SPARK-20958
> URL: https://issues.apache.org/jira/browse/SPARK-20958
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on 
> avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 
> and avro 1.7.7 used by spark-core 2.2.0-rc2.
> Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro 
> (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the 
> reasons mentioned in [PR 
> #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. 
> Therefore, we don't really have many choices here and have to roll back 
> parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033987#comment-16033987
 ] 

Apache Spark commented on SPARK-20958:
--

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/18181

> Roll back parquet-mr 1.8.2 to parquet-1.8.1
> ---
>
> Key: SPARK-20958
> URL: https://issues.apache.org/jira/browse/SPARK-20958
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on 
> avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 
> and avro 1.7.7 used by spark-core 2.2.0-rc2.
> , Spark 2.2.0-rc2 introduced two incompatible versions of avro (1.7.7 and 
> 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the reasons 
> mentioned in [PR 
> #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. 
> Therefore, we don't really have many choices here and have to roll back 
> parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org