Re: Failed to save RDD as text file to local file system

2015-11-05 Thread Hitoshi Ozawa
I'm a little bit late but posting in case somebody googles this.

It seems saveAsTextFile requires chmod 777 but the local directory won't
default to give w to other users.
I've tried saving to a mounted drive and was able to save without an error.

Without the the "file", it won't save to the file system.

e.g.
rdd.saveAsTextFile("file:mnt/shared/emp")



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p25300.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Failed to save RDD as text file to local file system

2015-01-08 Thread Raghavendra Pandey
Can you check permissions etc as I am able to run
r.saveAsTextFile("file:///home/cloudera/tmp/out1")
successfully on my machine..

On Fri, Jan 9, 2015 at 10:25 AM, NingjunWang 
wrote:

> I try to save RDD as text file to local file system (Linux) but it does not
> work
>
> Launch spark-shell and run the following
>
> val r = sc.parallelize(Array("a", "b", "c"))
> r.saveAsTextFile("file:///home/cloudera/tmp/out1")
>
>
> IOException: Mkdirs failed to create
>
> file:/home/cloudera/tmp/out1/_temporary/0/_temporary/attempt_201501082027_0003_m_00_47
> (exists=false, cwd=file:/var/run/spark/work/app-20150108201046-0021/0)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
> at
>
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
> at
> org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
> at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1056)
> at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> I also try with 4 slash but still get the same error
> r.saveAsTextFile("file:home/cloudera/tmp/out1")
>
> Please advise
>
> Ningjun
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Failed to save RDD as text file to local file system

2015-01-08 Thread VISHNU SUBRAMANIAN
looks like it is trying to save the file in Hdfs.

Check if you have set any hadoop path in your system.

On Fri, Jan 9, 2015 at 12:14 PM, Raghavendra Pandey <
raghavendra.pan...@gmail.com> wrote:

> Can you check permissions etc as I am able to run
> r.saveAsTextFile("file:///home/cloudera/tmp/out1") successfully on my
> machine..
>
> On Fri, Jan 9, 2015 at 10:25 AM, NingjunWang 
> wrote:
>
>> I try to save RDD as text file to local file system (Linux) but it does
>> not
>> work
>>
>> Launch spark-shell and run the following
>>
>> val r = sc.parallelize(Array("a", "b", "c"))
>> r.saveAsTextFile("file:///home/cloudera/tmp/out1")
>>
>>
>> IOException: Mkdirs failed to create
>>
>> file:/home/cloudera/tmp/out1/_temporary/0/_temporary/attempt_201501082027_0003_m_00_47
>> (exists=false, cwd=file:/var/run/spark/work/app-20150108201046-0021/0)
>> at
>>
>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
>> at
>>
>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
>> at
>>
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
>> at
>> org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
>> at
>>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1056)
>> at
>>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047)
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> I also try with 4 slash but still get the same error
>> r.saveAsTextFile("file:home/cloudera/tmp/out1")
>>
>> Please advise
>>
>> Ningjun
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Failed to save RDD as text file to local file system

2015-01-09 Thread firemonk9
Have you found any resolution for this issue ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Failed to save RDD as text file to local file system

2015-01-09 Thread NingjunWang
No, do you have any idea?

Regards,

Ningjun Wang
Consulting Software Engineer
LexisNexis
121 Chanlon Road
New Providence, NJ 07974-1541

From: firemonk9 [via Apache Spark User List] 
[mailto:ml-node+s1001560n21067...@n3.nabble.com]
Sent: Friday, January 09, 2015 2:56 PM
To: Wang, Ningjun (LNG-NPV)
Subject: Re: Failed to save RDD as text file to local file system

Have you found any resolution for this issue ?

If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html
To unsubscribe from Failed to save RDD as text file to local file system, click 
here<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=21050&code=bmluZ2p1bi53YW5nQGxleGlzbmV4aXMuY29tfDIxMDUwfC0xNzk5Mzg3ODYz>.
NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21068.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Failed to save RDD as text file to local file system

2015-01-12 Thread Prannoy
Have you tried simple giving the path where you want to save the file ?

For instance in your case just do

*r.saveAsTextFile("home/cloudera/tmp/out1") *

Dont use* file*

This will create a folder with name out1. saveAsTextFile always write by
making a directory, it does not write data into a single file.

Incase you need a single file you can use copyMerge API in FileUtils.

*FileUtil.copyMerge(fs, home/cloudera/tmp/out1, fs,home/cloudera/tmp/out2 ,
true, conf,null);*

Now out2 will be a single file containing your data.

*fs* is the configuration of you local file system.

Thanks



On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User List] <
ml-node+s1001560n21068...@n3.nabble.com> wrote:

>  No, do you have any idea?
>
>
>
> Regards,
>
>
>
> *Ningjun Wang*
>
> Consulting Software Engineer
>
> LexisNexis
>
> 121 Chanlon Road
>
> New Providence, NJ 07974-1541
>
>
>
> *From:* firemonk9 [via Apache Spark User List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=21068&i=0>]
> *Sent:* Friday, January 09, 2015 2:56 PM
> *To:* Wang, Ningjun (LNG-NPV)
> *Subject:* Re: Failed to save RDD as text file to local file system
>
>
>
> Have you found any resolution for this issue ?
>  --
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html
>
> To unsubscribe from Failed to save RDD as text file to local file system, 
> click
> here.
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21068.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cHJhbm5veUBzaWdtb2lkYW5hbHl0aWNzLmNvbXwxfC0xNTI2NTg4NjQ2>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21093.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Failed to save RDD as text file to local file system

2015-01-12 Thread Sean Owen
Without a scheme, it will be interpreted relative to the default FS
configured in the Hadoop configuration, which is almost surely HDFS.

No, the stack trace does not imply it was writing to HDFS. It would
use the HDFS FileSystem API in any event, but, you can see that the
path was a file: URI.

The goal is to write a local file. I don't see that multiple files are
a problem either.

The original problem is simply that mkdirs failed and that's almost
surely a permission issue. I don't see that this has been addressed by
the OP.

On Mon, Jan 12, 2015 at 9:18 AM, Prannoy  wrote:
> Have you tried simple giving the path where you want to save the file ?
>
> For instance in your case just do
>
> r.saveAsTextFile("home/cloudera/tmp/out1")
>
> Dont use file
>
> This will create a folder with name out1. saveAsTextFile always write by
> making a directory, it does not write data into a single file.
>
> Incase you need a single file you can use copyMerge API in FileUtils.
>
> FileUtil.copyMerge(fs, home/cloudera/tmp/out1, fs,home/cloudera/tmp/out2 ,
> true, conf,null);
>
> Now out2 will be a single file containing your data.
>
> fs is the configuration of you local file system.
>
> Thanks
>
>
>
> On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User List]
> <[hidden email]> wrote:
>>
>> No, do you have any idea?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Ningjun Wang
>>
>> Consulting Software Engineer
>>
>> LexisNexis
>>
>> 121 Chanlon Road
>>
>> New Providence, NJ 07974-1541
>>
>>
>>
>> From: firemonk9 [via Apache Spark User List] [mailto:[hidden email][hidden
>> email]]
>> Sent: Friday, January 09, 2015 2:56 PM
>> To: Wang, Ningjun (LNG-NPV)
>> Subject: Re: Failed to save RDD as text file to local file system
>>
>>
>>
>> Have you found any resolution for this issue ?
>>
>> 
>>
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html
>>
>> To unsubscribe from Failed to save RDD as text file to local file system,
>> click here.
>> NAML
>>
>>
>>
>> 
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21068.html
>> To start a new topic under Apache Spark User List, email [hidden email]
>> To unsubscribe from Apache Spark User List, click here.
>> NAML
>
>
>
> 
> View this message in context: Re: Failed to save RDD as text file to local
> file system
>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Failed to save RDD as text file to local file system

2015-01-12 Thread NingjunWang
Prannoy

I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return without 
error. But where does it saved to? The folder “/home/cloudera/tmp/out1” is not 
cretaed.

I also tried the following
cd /home/cloudera/tmp/
spark-shell
scala> val r = sc.parallelize(Array("a", "b", "c"))
scala> r.saveAsTextFile("out1")

It does not return error. But still there is no “out1” folder created under 
/home/cloudera/tmp/

I tried to give absolute path but then get an error

scala> r.saveAsTextFile("/home/cloudera/tmp/out1")
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

Very frustrated. Please advise.


Regards,

Ningjun Wang
Consulting Software Engineer
LexisNexis
121 Chanlon Road
New Providence, NJ 07974-1541

From: Prannoy [via Apache Spark User List] 
[mailto:ml-node+s1001560n21093...@n3.nabble.com]
Sent: Monday, January 12, 2015 4:18 AM
To: Wang, Ningjun (LNG-NPV)
Subject: Re: Failed to save RDD as text file to local file system

Have you tried simple giving the path where you want to save the file ?

For instance in your case just do

r.saveAsTextFile("home/cloudera/tmp/out1")

Dont use file

This will create a folder with name out1. saveAsTextFile always write by making 
a directory, it does not write data into a single file.

Incase you need a single file you can use copyMerge API in FileUtils.

FileUtil.copyMerge(fs, home/cloudera/tmp/out1, fs,home/cloudera/tmp/out2 , 
true, conf,null);
Now out2 will be a single file containing your data.
fs is the configuration of you local file system.
Thanks


On Sat, Jan 10, 2015 at 1:36 AM, NingjunWang [via Apache Spark User List] 
<[hidden email]> wrote:
No, do you have any idea?

Regards,

Ningjun Wang
Consulting Software Engineer
LexisNexis
121 Chanlon Road
New Providence, NJ 07974-1541

From: firemonk9 [via Apache Spark User List] [mailto:[hidden 
email][hidden 
email]<http://user/SendEmail.jtp?type=node&node=21068&i=0>]
Sent: Friday, January 09, 2015 2:56 PM
To: Wang, Ningjun (LNG-NPV)
Subject: Re: Failed to save RDD as text file to local file system

Have you found any resolution for this issue ?

If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html
To unsubscribe from Failed to save RDD as t

Re: Failed to save RDD as text file to local file system

2015-01-12 Thread Sean Owen
I think you're confusing HDFS paths and local paths. You are cd'ing to
a directory and seem to want to write output there, but your path has
no scheme and defaults to being an HDFS path. When you use "file:" you
seem to have a permission error (perhaps).

On Mon, Jan 12, 2015 at 4:21 PM, NingjunWang
 wrote:
> Prannoy
>
>
>
> I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return without
> error. But where does it saved to? The folder “/home/cloudera/tmp/out1” is
> not cretaed.
>
>
>
> I also tried the following
>
> cd /home/cloudera/tmp/
>
> spark-shell
>
> scala> val r = sc.parallelize(Array("a", "b", "c"))
>
> scala> r.saveAsTextFile("out1")
>
>
>
> It does not return error. But still there is no “out1” folder created under
> /home/cloudera/tmp/
>
>
>
> I tried to give absolute path but then get an error
>
>
>
> scala> r.saveAsTextFile("/home/cloudera/tmp/out1")
>
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787)
>
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>
>
> Very frustrated. Please advise.
>
>
>
>
>
> Regards,
>
>
>
> Ningjun Wang
>
> Consulting Software Engineer
>
> LexisNexis
>
> 121 Chanlon Road
>
> New Providence, NJ 07974-1541
>
>
>
> From: Prannoy [via Apache Spark User List] [mailto:ml-node+[hidden email]]
> Sent: Monday, January 12, 2015 4:18 AM
>
>
> To: Wang, Ningjun (LNG-NPV)
> Subject: Re: Failed to save RDD as text file to local file system
>
>
>
> Have you tried simple giving the path where you want to save the file ?
>
>
>
> For instance in your case just do
>
>
>
> r.saveAsTextFile("home/cloudera/tmp/out1")
>
>
>
> Dont use file
>
>
>
> This will create a folder with name out1. saveAsTextFile always write by
> making a directory, it does not write data into a single file.
>
>
>
> Incase you need a single file you can use copyMerge API in FileUtils.
>
>
>
>

RE: Failed to save RDD as text file to local file system

2015-01-13 Thread Wang, Ningjun (LNG-NPV)
I want to save to local directory. I have tried the following and get error

r.saveAsTextFile("file:/home/cloudera/tmp/out1")
r.saveAsTextFile("file:///home/cloudera/tmp/out1")
r.saveAsTextFile("file:home/cloudera/tmp/out1")

They all generate the following error
15/01/12 08:31:10 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 5, 
master01.cloudera): java.io.IOException: Mkdirs failed to create 
file:/home/cloudera/temp/out1/_temporary/0/_temporary/attempt_201501120831_0001_m_01_5
 (exists=false, cwd=file:/var/run/spark/work/app-20150112080951-0002/0)
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1056)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


The id that run spark and driver program have full permission to the dir 
/home/cloudera/tmp/. I can cd there and run "mkdir out1" to create the dir 
without problem. I then remove the dir "out1" and then run
r.saveAsTextFile("file:/home/cloudera/tmp/out1")

I got the error above but the dir "out1" is created. Look like 
r.saveAsTextFile(...) try to create sub dirs 
out1/_temporary/0/_temporary/attempt_201501120831_0001_m_01_5  which failed.

Has anybody successfully run  r.saveAsTextFile(...) to save RDD to local file 
system on Linux?

Ningjun


-Original Message-
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Monday, January 12, 2015 11:25 AM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: Failed to save RDD as text file to local file system

I think you're confusing HDFS paths and local paths. You are cd'ing to a 
directory and seem to want to write output there, but your path has no scheme 
and defaults to being an HDFS path. When you use "file:" you seem to have a 
permission error (perhaps).

On Mon, Jan 12, 2015 at 4:21 PM, NingjunWang  
wrote:
> Prannoy
>
>
>
> I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return 
> without error. But where does it saved to? The folder 
> “/home/cloudera/tmp/out1” is not cretaed.
>
>
>
> I also tried the following
>
> cd /home/cloudera/tmp/
>
> spark-shell
>
> scala> val r = sc.parallelize(Array("a", "b", "c"))
>
> scala> r.saveAsTextFile("out1")
>
>
>
> It does not return error. But still there is no “out1” folder created 
> under /home/cloudera/tmp/
>
>
>
> I tried to give absolute path but then get an error
>
>
>
> scala> r.saveAsTextFile("/home/cloudera/tmp/out1")
>
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eckFsPermission(DefaultAuthorizationProvider.java:257)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eck(DefaultAuthorizationProvider.java:238)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eck(DefaultAuthorizationProvider.java:216)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.ch
> eckPermission(DefaultAuthorizationProvider.java:145)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermis
> sion(FSPermissionChecker.java:138)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS
> Namesystem.java:6286)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FS
> Namesystem.java:6268)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAcces
> s(FSNamesystem.java:6220)
>
> at
> org.apache.hadoop.hdfs.server.name

Re: Failed to save RDD as text file to local file system

2015-01-13 Thread Prannoy
Hi,

Could you just trying one thing. Make a directory any where out side
cloudera and than try the same write.

Suppose the directory made is testWrite.

do r.saveAsTextFile("/home/testWrite/")

I think cloudera/tmp folder do not have a write permission for users hosted
other than the cloudera manager itself.

Thanks.

On Mon, Jan 12, 2015 at 9:51 PM, NingjunWang [via Apache Spark User List] <
ml-node+s1001560n21105...@n3.nabble.com> wrote:

>  Prannoy
>
>
>
> I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return
> without error. But where does it saved to? The folder
> “/home/cloudera/tmp/out1” is not cretaed.
>
>
>
> I also tried the following
>
> cd /home/cloudera/tmp/
>
> spark-shell
>
> scala> val r = sc.parallelize(Array("a", "b", "c"))
>
> scala> r.saveAsTextFile("out1")
>
>
>
> It does not return error. But still there is no “out1” folder created
> under /home/cloudera/tmp/
>
>
>
> I tried to give absolute path but then get an error
>
>
>
> scala> r.saveAsTextFile("/home/cloudera/tmp/out1")
>
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
>
> at
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787)
>
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>
>
> Very frustrated. Please advise.
>
>
>
>
>
> Regards,
>
>
>
> *Ningjun Wang*
>
> Consulting Software Engineer
>
> LexisNexis
>
> 121 Chanlon Road
>
> New Providence, NJ 07974-1541
>
>
>
> *From:* Prannoy [via Apache Spark User List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=21105&i=0>]
> *Sent:* Monday, January 12, 2015 4:18 AM
> *To:* Wang, Ningjun (LNG-NPV)
> *Subject:* Re: Failed to save RDD as text file to local file system
>
>
>
> Have you tried simple giving the path where you want to save the file ?
>
>
>
> For instance in your case just do
>
>
>
> *r.saveAsTextFile("home/cloudera/tmp/out1") *
>
>
>
> Dont use* file*
>
>
>
> This will create a folder with name ou

RE: Failed to save RDD as text file to local file system

2015-01-13 Thread Wang, Ningjun (LNG-NPV)
All right, I remove cloudera  totally and install spark manually on bare Linux 
system and now r.saveAsTextFile(…) works.

Thanks.

Regards,

Ningjun Wang
Consulting Software Engineer
LexisNexis
121 Chanlon Road
New Providence, NJ 07974-1541

From: Prannoy [mailto:pran...@sigmoidanalytics.com]
Sent: Tuesday, January 13, 2015 3:01 PM
To: user@spark.apache.org
Subject: Re: Failed to save RDD as text file to local file system

Hi,

Could you just trying one thing. Make a directory any where out side cloudera 
and than try the same write.

Suppose the directory made is testWrite.

do r.saveAsTextFile("/home/testWrite/")

I think cloudera/tmp folder do not have a write permission for users hosted 
other than the cloudera manager itself.

Thanks.

On Mon, Jan 12, 2015 at 9:51 PM, NingjunWang [via Apache Spark User List] 
<[hidden email]> wrote:
Prannoy

I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return without 
error. But where does it saved to? The folder “/home/cloudera/tmp/out1” is not 
cretaed.

I also tried the following
cd /home/cloudera/tmp/
spark-shell
scala> val r = sc.parallelize(Array("a", "b", "c"))
scala> r.saveAsTextFile("out1")

It does not return error. But still there is no “out1” folder created under 
/home/cloudera/tmp/

I tried to give absolute path but then get an error

scala> r.saveAsTextFile("/home/cloudera/tmp/out1")
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

Very frustrated. Please advise.


Regards,

Ningjun Wang
Consulting Software Engineer
LexisNexis
121 Chanlon Road
New Providence, NJ 07974-1541

From: Prannoy [via Apache Spark User List] [mailto:[hidden 
email][hidden 
email]<http://user/SendEmail.jtp?type=node&node=21105&i=0>]
Sent: Monday, January 12, 2015 4:18 AM
To: Wang, Ningjun (LNG-NPV)
Subject: Re: Failed to save RDD as text file to local file system

Have you tried simple giving the path where you want to save the file ?

For instance in your case just do

r.saveAsTextFile("home/cloudera/tmp/out1")

Dont use file

This will create a folder with name out1. saveAsTextFile always write by making 
a directory, it does not write data into a single file.

Incase you need a single file you can use copyMerge API in FileUtils.

FileUtil.copyMerge(fs, home/cloudera/tmp/out1, fs,home/cloudera/tmp/out2 , 
true, conf,null);
Now out2 will be a single file containing