[ https://issues.apache.org/jira/browse/SPARK-22651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16270721#comment-16270721 ]
Apache Spark commented on SPARK-22651: -------------------------------------- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/19845 > Calling ImageSchema.readImages initiate multiple Hive clients > ------------------------------------------------------------- > > Key: SPARK-22651 > URL: https://issues.apache.org/jira/browse/SPARK-22651 > Project: Spark > Issue Type: Bug > Components: ML, PySpark > Affects Versions: 2.3.0 > Reporter: Hyukjin Kwon > > While playing with images, I realised calling {{ImageSchema.readImages}} > multiple times seems attempting to create multiple Hive clients. > {code} > from pyspark.ml.image import ImageSchema > data_path = 'data/mllib/images/kittens' > _ = ImageSchema.readImages(data_path, recursive=True, > dropImageFailures=True).collect() > _ = ImageSchema.readImages(data_path, recursive=True, > dropImageFailures=True).collect() > {code} > {code} > ... > org.datanucleus.exceptions.NucleusDataStoreException: Unable to open a test > connection to the given database. JDBC url = > jdbc:derby:;databaseName=metastore_db;create=true, username = APP. > Terminating connection pool (set lazyInit to true if you expect to start your > database after your app). Original Exception: ------ > java.sql.SQLException: Failed to start database 'metastore_db' with class > loader > org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@742f639f, see > the next exception for details. > ... > at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source) > ... > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > ... > at > org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:180) > ... > at > org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:348) > at > org.apache.spark.ml.image.ImageSchema$$anonfun$readImages$2$$anonfun$apply$1.apply(ImageSchema.scala:253) > ... > Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class > loader > org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@742f639f, see > the next exception for details. > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown > Source) > ... 121 more > Caused by: ERROR XSDB6: Another instance of Derby may have already booted the > database /.../spark/metastore_db. > ... > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/.../spark/python/pyspark/ml/image.py", line 190, in readImages > dropImageFailures, float(sampleRatio), seed) > File "/.../spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line > 1160, in __call__ > File "/.../spark/python/pyspark/sql/utils.py", line 69, in deco > raise AnalysisException(s.split(': ', 1)[1], stackTrace) > pyspark.sql.utils.AnalysisException: u'java.lang.RuntimeException: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;' > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org