Hi All, I tried to configure my Spark with MapR hadoop cluster. For that I built Spark 2.0 from source with hadoop-provided option. Then as per the document I set my hadoop libraries in spark-env.sh. However I get an error while SessionCatalog is getting created. Please refer below for exception stack trace. Point to note is default scheme for MapR is "maprfs://". Hence the error.
I can see some fixes were there earlier to solve the problem. https://github.com/apache/spark/pull/13348 But another PR removed the code. https://github.com/apache/spark/pull/13868/files. If I take the changes in the 1st PR mentioned here it works perfectly fine. Is it intentional or is it a bug ? If its intentional , does user always have to run drivers on a hadoop cluster node ? Which might make "some" sense in a production environment , but it is not very helpful during development. GIT version on my fork : d16f9a0b7c464728d7b11899740908e23820a797. Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra Exception Stack ===================================================================== 2016-08-29 18:30:17,0869 ERROR JniCommon fs/client/fileclient/cc/jni_MapRClient.cc:2073 Thread: 18258 mkdirs failed for /rishim1/POCs/spark-2.0.1-SNAPSHOT-bin-custom-spark/spar, error 13 org.apache.spark.SparkException: Unable to create database default as failed to create its directory maprfs:///rishim1/POCs/spark-2.0.1-SNAPSHOT-bin-custom-spark/spark-warehouse at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.liftedTree1$1(InMemoryCatalog.scala:126) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.createDatabase(InMemoryCatalog.scala:120) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89) at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:427) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:411) ... 48 elided Caused by: org.apache.hadoop.security.AccessControlException: User rishim(user id 1000) has been denied access to create spark-warehouse at com.mapr.fs.MapRFileSystem.makeDir(MapRFileSystem.java:1239) at com.mapr.fs.MapRFileSystem.mkdirs(MapRFileSystem.java:1259) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1913) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.liftedTree1$1(InMemoryCatalog.scala:123) ... 62 more