Re: Failed to save RDD as text file to local file system
I'm a little bit late but posting in case somebody googles this. It seems saveAsTextFile requires chmod 777 but the local directory won't default to give w to other users. I've tried saving to a mounted drive and was able to save without an error. Without the the "file", it won't save to the file system. e.g. rdd.saveAsTextFile("file:mnt/shared/emp") -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p25300.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Failed to save RDD as text file to local file system
Hi, Could you just trying one thing. Make a directory any where out side cloudera and than try the same write. Suppose the directory made is testWrite. do r.saveAsTextFile(/home/testWrite/) I think cloudera/tmp folder do not have a write permission for users hosted other than the cloudera manager itself. Thanks. On Mon, Jan 12, 2015 at 9:51 PM, NingjunWang [via Apache Spark User List] ml-node+s1001560n21105...@n3.nabble.com wrote: Prannoy I tried this r.saveAsTextFile(home/cloudera/tmp/out1), it return without error. But where does it saved to? The folder “/home/cloudera/tmp/out1” is not cretaed. I also tried the following cd /home/cloudera/tmp/ spark-shell scala val r = sc.parallelize(Array(a, b, c)) scala r.saveAsTextFile(out1) It does not return error. But still there is no “out1” folder created under /home/cloudera/tmp/ I tried to give absolute path but then get an error scala r.saveAsTextFile(/home/cloudera/tmp/out1) org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Very frustrated. Please advise. Regards, *Ningjun Wang* Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 *From:* Prannoy [via Apache Spark User List] [mailto:ml-node+[hidden email] http:///user/SendEmail.jtp?type=nodenode=21105i=0] *Sent:* Monday, January 12, 2015 4:18 AM *To:* Wang, Ningjun (LNG-NPV) *Subject:* Re: Failed to save RDD as text file to local file system Have you tried simple giving the path where you want to save the file ? For instance in your case just do *r.saveAsTextFile(home/cloudera/tmp/out1) * Dont use* file* This will create a folder with name out1. saveAsTextFile always write by making a directory, it does not write data into a single file. Incase you need a single file you can use copyMerge API in FileUtils. *FileUtil.copyMerge(fs, **home/cloudera/tmp/out1, fs,**home/cloudera/tmp/out2 , true, conf,null);* Now out2 will be a single file containing your data. *fs* is the configuration of you local file system. Thanks On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User List] [hidden email] http:///user/SendEmail.jtp?type=nodenode=21093i=0 wrote: No, do you have any idea? Regards, *Ningjun Wang* Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 *From:* firemonk9 [via Apache
RE: Failed to save RDD as text file to local file system
I want to save to local directory. I have tried the following and get error r.saveAsTextFile(file:/home/cloudera/tmp/out1) r.saveAsTextFile(file:///home/cloudera/tmp/out1) r.saveAsTextFile(file:home/cloudera/tmp/out1) They all generate the following error 15/01/12 08:31:10 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 5, master01.cloudera): java.io.IOException: Mkdirs failed to create file:/home/cloudera/temp/out1/_temporary/0/_temporary/attempt_201501120831_0001_m_01_5 (exists=false, cwd=file:/var/run/spark/work/app-20150112080951-0002/0) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1056) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) The id that run spark and driver program have full permission to the dir /home/cloudera/tmp/. I can cd there and run mkdir out1 to create the dir without problem. I then remove the dir out1 and then run r.saveAsTextFile(file:/home/cloudera/tmp/out1) I got the error above but the dir out1 is created. Look like r.saveAsTextFile(...) try to create sub dirs out1/_temporary/0/_temporary/attempt_201501120831_0001_m_01_5 which failed. Has anybody successfully run r.saveAsTextFile(...) to save RDD to local file system on Linux? Ningjun -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, January 12, 2015 11:25 AM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: Failed to save RDD as text file to local file system I think you're confusing HDFS paths and local paths. You are cd'ing to a directory and seem to want to write output there, but your path has no scheme and defaults to being an HDFS path. When you use file: you seem to have a permission error (perhaps). On Mon, Jan 12, 2015 at 4:21 PM, NingjunWang ningjun.w...@lexisnexis.com wrote: Prannoy I tried this r.saveAsTextFile(home/cloudera/tmp/out1), it return without error. But where does it saved to? The folder “/home/cloudera/tmp/out1” is not cretaed. I also tried the following cd /home/cloudera/tmp/ spark-shell scala val r = sc.parallelize(Array(a, b, c)) scala r.saveAsTextFile(out1) It does not return error. But still there is no “out1” folder created under /home/cloudera/tmp/ I tried to give absolute path but then get an error scala r.saveAsTextFile(/home/cloudera/tmp/out1) org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch eckFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch eck(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch eck(DefaultAuthorizationProvider.java:216) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch eckPermission(DefaultAuthorizationProvider.java:145) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermis sion(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS Namesystem.java:6286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS Namesystem.java:6268) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAcces s(FSNamesystem.java:6220) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSN amesystem.java:4087) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesy stem.java:4057) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesyste m.java:4030) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNo deRpcServer.java:787) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClien
RE: Failed to save RDD as text file to local file system
All right, I remove cloudera totally and install spark manually on bare Linux system and now r.saveAsTextFile(…) works. Thanks. Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: Prannoy [mailto:pran...@sigmoidanalytics.com] Sent: Tuesday, January 13, 2015 3:01 PM To: user@spark.apache.org Subject: Re: Failed to save RDD as text file to local file system Hi, Could you just trying one thing. Make a directory any where out side cloudera and than try the same write. Suppose the directory made is testWrite. do r.saveAsTextFile(/home/testWrite/) I think cloudera/tmp folder do not have a write permission for users hosted other than the cloudera manager itself. Thanks. On Mon, Jan 12, 2015 at 9:51 PM, NingjunWang [via Apache Spark User List] [hidden email]/user/SendEmail.jtp?type=nodenode=21127i=0 wrote: Prannoy I tried this r.saveAsTextFile(home/cloudera/tmp/out1), it return without error. But where does it saved to? The folder “/home/cloudera/tmp/out1” is not cretaed. I also tried the following cd /home/cloudera/tmp/ spark-shell scala val r = sc.parallelize(Array(a, b, c)) scala r.saveAsTextFile(out1) It does not return error. But still there is no “out1” folder created under /home/cloudera/tmp/ I tried to give absolute path but then get an error scala r.saveAsTextFile(/home/cloudera/tmp/out1) org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Very frustrated. Please advise. Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: Prannoy [via Apache Spark User List] [mailto:[hidden email]/user/SendEmail.jtp?type=nodenode=21127i=1[hidden email]http://user/SendEmail.jtp?type=nodenode=21105i=0] Sent: Monday, January 12, 2015 4:18 AM To: Wang, Ningjun (LNG-NPV) Subject: Re: Failed to save RDD as text file to local file system Have you tried simple giving the path where you want to save the file ? For instance in your case just do r.saveAsTextFile(home/cloudera/tmp/out1) Dont use file This will create a folder with name out1. saveAsTextFile always write by making a directory, it does not write data into a single file. Incase you need a single file you can use copyMerge API in FileUtils. FileUtil.copyMerge(fs, home/cloudera/tmp/out1, fs,home/cloudera/tmp/out2 , true, conf,null); Now out2 will be a single file containing your data. fs is the configuration of you local file system. Thanks
Re: Failed to save RDD as text file to local file system
Have you tried simple giving the path where you want to save the file ? For instance in your case just do *r.saveAsTextFile(home/cloudera/tmp/out1) * Dont use* file* This will create a folder with name out1. saveAsTextFile always write by making a directory, it does not write data into a single file. Incase you need a single file you can use copyMerge API in FileUtils. *FileUtil.copyMerge(fs, home/cloudera/tmp/out1, fs,home/cloudera/tmp/out2 , true, conf,null);* Now out2 will be a single file containing your data. *fs* is the configuration of you local file system. Thanks On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User List] ml-node+s1001560n21068...@n3.nabble.com wrote: No, do you have any idea? Regards, *Ningjun Wang* Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 *From:* firemonk9 [via Apache Spark User List] [mailto:ml-node+[hidden email] http:///user/SendEmail.jtp?type=nodenode=21068i=0] *Sent:* Friday, January 09, 2015 2:56 PM *To:* Wang, Ningjun (LNG-NPV) *Subject:* Re: Failed to save RDD as text file to local file system Have you found any resolution for this issue ? -- *If you reply to this email, your message will be added to the discussion below:* http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html To unsubscribe from Failed to save RDD as text file to local file system, click here. NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21068.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3.nabble.com To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=cHJhbm5veUBzaWdtb2lkYW5hbHl0aWNzLmNvbXwxfC0xNTI2NTg4NjQ2 . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21093.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Failed to save RDD as text file to local file system
I think you're confusing HDFS paths and local paths. You are cd'ing to a directory and seem to want to write output there, but your path has no scheme and defaults to being an HDFS path. When you use file: you seem to have a permission error (perhaps). On Mon, Jan 12, 2015 at 4:21 PM, NingjunWang ningjun.w...@lexisnexis.com wrote: Prannoy I tried this r.saveAsTextFile(home/cloudera/tmp/out1), it return without error. But where does it saved to? The folder “/home/cloudera/tmp/out1” is not cretaed. I also tried the following cd /home/cloudera/tmp/ spark-shell scala val r = sc.parallelize(Array(a, b, c)) scala r.saveAsTextFile(out1) It does not return error. But still there is no “out1” folder created under /home/cloudera/tmp/ I tried to give absolute path but then get an error scala r.saveAsTextFile(/home/cloudera/tmp/out1) org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Very frustrated. Please advise. Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: Prannoy [via Apache Spark User List] [mailto:ml-node+[hidden email]] Sent: Monday, January 12, 2015 4:18 AM To: Wang, Ningjun (LNG-NPV) Subject: Re: Failed to save RDD as text file to local file system Have you tried simple giving the path where you want to save the file ? For instance in your case just do r.saveAsTextFile(home/cloudera/tmp/out1) Dont use file This will create a folder with name out1. saveAsTextFile always write by making a directory, it does not write data into a single file. Incase you need a single file you can use copyMerge API in FileUtils. FileUtil.copyMerge(fs, home/cloudera/tmp/out1, fs,home/cloudera/tmp/out2 , true, conf,null); Now out2 will be a single file containing your data. fs is the configuration of you local file system. Thanks On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User List] [hidden email] wrote: No, do you have any idea? Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: firemonk9 [via Apache Spark User List] [mailto:[hidden email][hidden email]] Sent: Friday, January 09, 2015 2:56 PM To: Wang, Ningjun (LNG-NPV) Subject: Re: Failed to save RDD as text file to local file system Have you found any resolution for this issue
RE: Failed to save RDD as text file to local file system
Prannoy I tried this r.saveAsTextFile(home/cloudera/tmp/out1), it return without error. But where does it saved to? The folder “/home/cloudera/tmp/out1” is not cretaed. I also tried the following cd /home/cloudera/tmp/ spark-shell scala val r = sc.parallelize(Array(a, b, c)) scala r.saveAsTextFile(out1) It does not return error. But still there is no “out1” folder created under /home/cloudera/tmp/ I tried to give absolute path but then get an error scala r.saveAsTextFile(/home/cloudera/tmp/out1) org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Very frustrated. Please advise. Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: Prannoy [via Apache Spark User List] [mailto:ml-node+s1001560n21093...@n3.nabble.com] Sent: Monday, January 12, 2015 4:18 AM To: Wang, Ningjun (LNG-NPV) Subject: Re: Failed to save RDD as text file to local file system Have you tried simple giving the path where you want to save the file ? For instance in your case just do r.saveAsTextFile(home/cloudera/tmp/out1) Dont use file This will create a folder with name out1. saveAsTextFile always write by making a directory, it does not write data into a single file. Incase you need a single file you can use copyMerge API in FileUtils. FileUtil.copyMerge(fs, home/cloudera/tmp/out1, fs,home/cloudera/tmp/out2 , true, conf,null); Now out2 will be a single file containing your data. fs is the configuration of you local file system. Thanks On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User List] [hidden email]/user/SendEmail.jtp?type=nodenode=21093i=0 wrote: No, do you have any idea? Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: firemonk9 [via Apache Spark User List] [mailto:[hidden email]/user/SendEmail.jtp?type=nodenode=21093i=1[hidden email]http://user/SendEmail.jtp?type=nodenode=21068i=0] Sent: Friday, January 09, 2015 2:56 PM To: Wang, Ningjun (LNG-NPV) Subject: Re: Failed to save RDD as text file to local file system Have you found any resolution for this issue ? If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html To unsubscribe from Failed to save RDD as text file to local file system, click here.
Re: Failed to save RDD as text file to local file system
Have you found any resolution for this issue ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Failed to save RDD as text file to local file system
No, do you have any idea? Regards, Ningjun Wang Consulting Software Engineer LexisNexis 121 Chanlon Road New Providence, NJ 07974-1541 From: firemonk9 [via Apache Spark User List] [mailto:ml-node+s1001560n21067...@n3.nabble.com] Sent: Friday, January 09, 2015 2:56 PM To: Wang, Ningjun (LNG-NPV) Subject: Re: Failed to save RDD as text file to local file system Have you found any resolution for this issue ? If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html To unsubscribe from Failed to save RDD as text file to local file system, click herehttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=21050code=bmluZ2p1bi53YW5nQGxleGlzbmV4aXMuY29tfDIxMDUwfC0xNzk5Mzg3ODYz. NAMLhttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21068.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Failed to save RDD as text file to local file system
I try to save RDD as text file to local file system (Linux) but it does not work Launch spark-shell and run the following val r = sc.parallelize(Array(a, b, c)) r.saveAsTextFile(file:///home/cloudera/tmp/out1) IOException: Mkdirs failed to create file:/home/cloudera/tmp/out1/_temporary/0/_temporary/attempt_201501082027_0003_m_00_47 (exists=false, cwd=file:/var/run/spark/work/app-20150108201046-0021/0) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1056) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I also try with 4 slash but still get the same error r.saveAsTextFile(file:home/cloudera/tmp/out1) Please advise Ningjun -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Failed to save RDD as text file to local file system
looks like it is trying to save the file in Hdfs. Check if you have set any hadoop path in your system. On Fri, Jan 9, 2015 at 12:14 PM, Raghavendra Pandey raghavendra.pan...@gmail.com wrote: Can you check permissions etc as I am able to run r.saveAsTextFile(file:///home/cloudera/tmp/out1) successfully on my machine.. On Fri, Jan 9, 2015 at 10:25 AM, NingjunWang ningjun.w...@lexisnexis.com wrote: I try to save RDD as text file to local file system (Linux) but it does not work Launch spark-shell and run the following val r = sc.parallelize(Array(a, b, c)) r.saveAsTextFile(file:///home/cloudera/tmp/out1) IOException: Mkdirs failed to create file:/home/cloudera/tmp/out1/_temporary/0/_temporary/attempt_201501082027_0003_m_00_47 (exists=false, cwd=file:/var/run/spark/work/app-20150108201046-0021/0) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1056) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I also try with 4 slash but still get the same error r.saveAsTextFile(file:home/cloudera/tmp/out1) Please advise Ningjun -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org