Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

Peter Rudenko Fri, 29 May 2015 08:53:05 -0700

Hi Yin, i’m using spark-hive dependency and tests for my app work forspark1.3.1.seems it’s something with hive & sbt. Running from spark-shell nextstatement works, but from sbt console in rc3 i get next error:


scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

15/05/29 16:31:06 WARN ObjectStore: Version information not found inmetastore. hive.metastore.schema.verification is not enabled sorecording the schema version 0.13.1aasqlContext: org.apache.spark.sql.hive.HiveContext =org.apache.spark.sql.hive.HiveContext@177ac9f4


scala> val data = sqlContext.read.parquet("caches/-1525448137")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder forfurther details.java.lang.IllegalArgumentException: Unable to locate hive jars toconnect to metastore using classloaderscala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please setspark.sql.hive.metastore.jarsatorg.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206)atorg.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175)atorg.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:367)atorg.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367)

    at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366)

atorg.apache.spark.sql.hive.HiveContext$$anon$1.<init>(HiveContext.scala:379)atorg.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379)atorg.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378)atorg.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901)

    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:134)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)

atorg.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:419)atorg.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:264)



Thanks,
Peter Rudenko

On 2015-05-29 07:08, Yin Huai wrote:

Justin,

If you are creating multiple HiveContexts in tests, you need to assigna temporary metastore location for every HiveContext (like what we doat here<https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L527-L543>).Otherwise, they all try to connect to the metastore in the current dir(look at metastore_db).


Peter,

Do you also have the same use case as Justin (creating multipleHiveContexts in tests)? Can you explain what you meant by "all tests"?I am probably missing some context at here.


Thanks,

Yin

On Thu, May 28, 2015 at 11:28 AM, Peter Rudenko<[email protected] <mailto:[email protected]>> wrote:


    Also have the same issue - all tests fail because of HiveContext /
    derby lock.

    |Cause: javax.jdo.JDOFatalDataStoreException: Unable to open a test
    connection to the given database. JDBC url =
    jdbc:derby:;databaseName=metastore_db;create=true, username = APP.
    Terminating connection pool (set lazyInit to true if you expect to
    start your database after your app). Original Exception: ------
    [info] java.sql.SQLException: Failed to start database
    'metastore_db' with class loader
    org.apache.spark.sql.hive.client.IsolatedClientLoader$anon$1@8066e0e,
    see the next exception for details. |

    Also is there build for hadoop2.6? Don’t see it here:
    http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/
    <http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/>

    Thanks,
    Peter Rudenko

    On 2015-05-22 22:56, Justin Uang wrote:

    I'm working on one of the Palantir teams using Spark, and here is
    our feedback:

    We have encountered three issues when upgrading to spark 1.4.0.
    I'm not sure they qualify as a -1, as they come from using
    non-public APIs and multiple spark contexts for the purposes of
    testing, but I do want to bring them up for awareness =)

     1. Our UDT was serializing to a StringType, but now strings are
        represented internally as UTF8String, so we had to change our
        UDT to use UTF8String.apply() and UTF8String.toString() to
        convert back to String.
     2. createDataFrame when using UDTs used to accept things in the
        serialized catalyst form. Now, they're supposed to be in the
        UDT java class form (I think this change would've affected us
        in 1.3.1 already, since we were in 1.3.0)
     3. derby database lifecycle management issue with HiveContext.
        We have been using a SparkContextResource JUnit Rule that we
        wrote, and it sets up then tears down a SparkContext and
        HiveContext between unit test runs within the same process
        (possibly the same thread as well). Multiple contexts are not
        being used at once. It used to work in 1.3.0, but now when we
        try to create the HiveContext for the second unit test, then
        it complains with the following exception. I have a feeling
        it might have something to do with the Hive object being
        thread local, and us not explicitly closing the HiveContext
        and everything it holds. The full stack trace is here:
        https://gist.github.com/justinuang/0403d49cdeedf91727cd

    Caused by: java.sql.SQLException: Failed to start database 'metastore_db' 
with class loader 
org.apache.spark.sql.hive.client.IsolatedClientLoader$anon$1@5dea2446, see the 
next exception for details.
        at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)

    On Wed, May 20, 2015 at 10:35 AM Imran Rashid
    <[email protected] <mailto:[email protected]>> wrote:

        -1

        discovered I accidentally removed master & worker json
        endpoints, will restore
        https://issues.apache.org/jira/browse/SPARK-7760

        On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell
        <[email protected] <mailto:[email protected]>> wrote:

            Please vote on releasing the following candidate as
            Apache Spark version 1.4.0!

            The tag to be voted on is v1.4.0-rc1 (commit 777a081):
            
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

            The release files, including signatures, digests, etc.
            can be found at:
            http://people.apache.org/~pwendell/spark-1.4.0-rc1/
            <http://people.apache.org/%7Epwendell/spark-1.4.0-rc1/>

            Release artifacts are signed with the following key:
            https://people.apache.org/keys/committer/pwendell.asc

            The staging repository for this release can be found at:
            
https://repository.apache.org/content/repositories/orgapachespark-1092/

            The documentation corresponding to this release can be
            found at:
            http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/
            <http://people.apache.org/%7Epwendell/spark-1.4.0-rc1-docs/>

            Please vote on releasing this package as Apache Spark 1.4.0!

            The vote is open until Friday, May 22, at 17:03 UTC and
            passes
            if a majority of at least 3 +1 PMC votes are cast.

            [ ] +1 Release this package as Apache Spark 1.4.0
            [ ] -1 Do not release this package because ...

            To learn more about Apache Spark, please see
            http://spark.apache.org/

            == How can I help test this release? ==
            If you are a Spark user, you can help us test this release by
            taking a Spark 1.3 workload and running on this release
            candidate,
            then reporting any regressions.

            == What justifies a -1 vote for this release? ==
            This vote is happening towards the end of the 1.4 QA period,
            so -1 votes should only occur for significant regressions
            from 1.3.1.
            Bugs already present in 1.3.X, minor regressions, or bugs
            related
            to new features will not block this release.

            
---------------------------------------------------------------------
            To unsubscribe, e-mail: [email protected]
            <mailto:[email protected]>
            For additional commands, e-mail:
            [email protected] <mailto:[email protected]>

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

Reply via email to