[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

Arun (JIRA) Thu, 18 Jun 2015 05:49:54 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591723#comment-14591723
 ]


Arun edited comment on SPARK-8409 at 6/18/15 12:48 PM:
-------------------------------------------------------

 Hi Shivaram,

I got the below error when i did as you told, reading from hdfs for csv file, 
kindly make a note that the HDFS path which i have given is syntax correct.

TIA
>df_1 <- read.df(sqlContext, 
>"hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv",
 "com.databricks.spark.csv", header="true")
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
        at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
        at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM
essageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage
Decoder.java:163)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne
lPipeline.java:787)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:130)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:116)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to load class for data source: com
.databricks.spark.csv
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl
.scala:216)
        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)

        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230)
        ... 25 more
Error: returnStatus == 0 is not TRUE


was (Author: b.arunguna...@gmail.com):
 Hi Shivaram,

I got the below error when i did as you told, reading from hdfs for csv file, 
kindly make a note that the HDFS link which i have given is syntax correct.

TIA
>df_1 <- read.df(sqlContext, 
>"hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv",
 "com.databricks.spark.csv", header="true")
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
        at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
        at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM
essageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage
Decoder.java:163)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne
lPipeline.java:787)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:130)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:116)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to load class for data source: com
.databricks.spark.csv
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl
.scala:216)
        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)

        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230)
        ... 25 more
Error: returnStatus == 0 is not TRUE

>  In windows cant able to read .csv or .json files using read.df()
> -----------------------------------------------------------------
>
>                 Key: SPARK-8409
>                 URL: https://issues.apache.org/jira/browse/SPARK-8409
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.4.0
>         Environment: sparkR API
>            Reporter: Arun
>            Priority: Critical
>              Labels: build
>
> Hi, 
> In SparkR shell, I invoke: 
> > mydf<-read.df(sqlContext, "/home/esten/ami/usaf.json", source="json", 
> > header="false") 
> I have tried various filetypes (csv, txt), all fail.   
>  in sparkR of spark 1.4 for eg.) df_1<- read.df(sqlContext, 
> "E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv",
>  source = "csv")
> RESPONSE: "ERROR RBackendHandler: load on 1 failed" 
> BELOW THE WHOLE RESPONSE: 
> 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
> curMem=0, maxMem=278302556 
> 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
> memory (estimated size 173.4 KB, free 265.2 MB) 
> 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
> curMem=177600, maxMem=278302556 
> 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
> in memory (estimated size 16.2 KB, free 265.2 MB) 
> 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
> on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
> 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
> NativeMethodAccessorImpl.java:-2 
> 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded. 
> 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
> java.lang.reflect.InvocationTargetException 
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
>         at java.lang.reflect.Method.invoke(Method.java:606) 
>         at 
> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
>  
>         at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
>         at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
>         at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>  
>         at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>  
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>  
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
>  
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>  
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>  
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
>  
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>  
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>  
>         at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  
>         at java.lang.Thread.run(Thread.java:745) 
> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
> not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
>         at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>  
>         at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) 
>         at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 
>         at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) 
>         at 
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
>         at 
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
>         at scala.Option.getOrElse(Option.scala:120) 
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>  
>         at 
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
>         at 
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
>         at scala.Option.getOrElse(Option.scala:120) 
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>  
>         at 
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
>         at 
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
>         at scala.Option.getOrElse(Option.scala:120) 
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
>         at 
> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1069) 
>         at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>  
>         at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
>  
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) 
>         at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1067) 
>         at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58) 
>         at 
> org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139)
>  
>         at 
> org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138)
>  
>         at scala.Option.getOrElse(Option.scala:120) 
>         at 
> org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137)
>  
>         at 
> org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137) 
>         at 
> org.apache.spark.sql.sources.LogicalRelation.<init>(LogicalRelation.scala:30) 
>         at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120) 
>         at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230) 
>         ... 25 more 
> Error: returnStatus == 0 is not TRUE
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

Reply via email to