Re: Strange behavior of spark-shell while accessing hdfs
Thanks guys for the info. I have to use yarn to access a kerberos cluster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behavior-of-spark-shell-while-accessing-hdfs-tp18549p18677.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark-shell exception while running in YARN mode
The Pi example gives same error in yarn mode HADOOP_CONF_DIR=/home/gs/conf/current ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client ../examples/target/spark-examples_2.10-1.2.0-SNAPSHOT.jar What could be wrong here? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-exception-while-running-in-YARN-mode-tp18679p18688.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: disable log4j for spark-shell
Tried --driver-java-options and SPARK_JAVA_OPTS, none of them worked Had to change the default one and rebuilt. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18513.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: disable log4j for spark-shell
Even after changing core/src/main/resources/org/apache/spark/log4j-defaults.properties to WARN followed by a rebuild, the log level is still INFO. Any other suggestions? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18518.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: disable log4j for spark-shell
Some console messages: 14/11/10 20:04:33 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46713 14/11/10 20:04:33 INFO util.Utils: Successfully started service 'HTTP file server' on port 46713. 14/11/10 20:04:34 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/11/10 20:04:34 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/11/10 20:04:34 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 14/11/10 20:04:34 INFO netty.NettyBlockTransferService: Server created on 46997 14/11/10 20:04:34 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/11/10 20:04:34 INFO storage.BlockManagerMasterActor: Registering block manager localhost:46997 with 265.0 MB RAM, BlockManagerId(driver, localhost, 46997) 14/11/10 20:04:35 INFO storage.BlockManagerMaster: Registered BlockManager and the log4j-default.properties looks like: cat core/src/main/resources/org/apache/spark/log4j-defaults.properties # Set everything to be logged to the console log4j.rootCategory=WARN, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=WARN Any suggestions? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18520.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Strange behavior of spark-shell while accessing hdfs
I am trying spark-shell on a single host and got some strange behavior of spark-shell. If I run bin/spark-shell without connecting a master, it can access a hdfs file on a remote cluster with kerberos authentication. scala val textFile = sc.textFile(hdfs://*.*.*.*:8020/user/lih/drill_test/test.csv) scala textFile.count() res0: Long = 9 However, if I start the master and slave on the same host and using bin/spark-shell --master spark://*.*.*.*:7077 run the same commands scala textFile.count() 14/11/11 05:00:23 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, stgace-launcher06.diy.corp.ne1.yahoo.com): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: *.*.*.*.com/98.138.236.95; destination host is: *.*.*.*:8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1375) at org.apache.hadoop.ipc.Client.call(Client.java:1324) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225) at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:228) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) at org.apache.hadoop.mapred.LineRecordReader.init(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:233) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:657) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1423) at org.apache.hadoop.ipc.Client.call(Client.java:1342) ... 38 more Caused by:
Spark 1.1.0 with Hadoop 2.5.0
Does Spark 1.1.0 work with Hadoop 2.5.0? The maven build instruction only has command options up to hadoop 2.4. Anybody ever made it work? I am trying to run spark-sql with hive 0.12 on top of hadoop 2.5.0 but can't make it work. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-1-0-with-Hadoop-2-5-0-tp15827.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to make ./bin/spark-sql work with hive?
No, it is hive 0.12.4. Let me try your suggestion. It is an existing hive db. I am using the original hive-site.xml as is. Sent from my iPhone On Oct 3, 2014, at 5:02 PM, Edwin Chiu edwin.c...@manage.com wrote: Are you using hive 0.13? Switching back to HadoopDefaultAuthenticator in your hive-site.xml worth a shot property namehive.security.authenticator.manager/name !--valueorg.apache.hadoop.hive.ql.security.ProxyUserAuthenticator/value-- valueorg.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator/value /property - Edwin On Fri, Oct 3, 2014 at 4:25 PM, Li HM hmx...@gmail.com wrote: If I don't have that jar, I am getting the following error: xception in thread main java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hcatalog.security.HdfsAuthorizationProvider at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:286) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:116) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hcatalog.security.HdfsAuthorizationProvider at org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:342) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:280) ... 9 more Caused by: java.lang.ClassNotFoundException: org.apache.hcatalog.security.HdfsAuthorizationProvider at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:335) ... 10 more On Fri, Oct 3, 2014 at 3:27 PM, Michael Armbrust mich...@databricks.com wrote: Why are you including hcatalog-core.jar? That is probably causing the issues. On Fri, Oct 3, 2014 at 3:03 PM, Li HM hmx...@gmail.com wrote: This is my SPARK_CLASSPATH after cleanup SPARK_CLASSPATH=/home/test/lib/hcatalog-core.jar:$SPARK_CLASSPATH now use mydb works. but show tables and select * from test still gives exception: spark-sql show tables; OK java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:551) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:272) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:38) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:103) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:98) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:58) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at