[jira] [Commented] (SPARK-12378) CREATE EXTERNAL TABLE AS SELECT EXPORT AWS S3 ERROR
[ https://issues.apache.org/jira/browse/SPARK-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358393#comment-16358393 ] Arun commented on SPARK-12378: -- I am also getting the same issue when I am trying to insert data in hive from spark. My table is an external table stores in AWS S3. Although the data gets inserted in the table, but it gives this message: {code:java} -chgrp: '' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... 18/02/09 13:25:56 ERROR KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! -chgrp: '' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...{code} Any resolution please? > CREATE EXTERNAL TABLE AS SELECT EXPORT AWS S3 ERROR > --- > > Key: SPARK-12378 > URL: https://issues.apache.org/jira/browse/SPARK-12378 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 > Environment: AWS EMR 4.2.0 > Just Master Running m3.xlarge > Applications: > Hive 1.0.0 > Spark 1.5.2 >Reporter: CESAR MICHELETTI >Priority: Major > > I am receive the bellow error during try exporting data to AWS S3, in > spark-sql. > Command: > CREATE external TABLE export > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' > -- lines terminated by '\n' > STORED AS TEXTFILE > LOCATION 's3://xxx/yyy' > AS > SELECT > xxx > > (complete query) > ; > Error: > -chgrp: '' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: '' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > 15/12/16 21:09:25 ERROR SparkSQLDriver: Failed in [CREATE external TABLE > csvexport > ... > (create table + query) > ... > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:441) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:489) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256) > at > org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211) > at > org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248) > at > org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:488) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:263) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933) > at > org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:89) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at >
[jira] [Reopened] (SPARK-22561) Dynamically update topics list for spark kafka consumer
[ https://issues.apache.org/jira/browse/SPARK-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun reopened SPARK-22561: -- Thanks [~c...@koeninger.org] - The SubscribePattern allows you to use a regex to specify topics of interest. Note that unlike the 0.8 integration, using Subscribe or SubscribePattern should respond to adding partitions during a running stream. I tested the SubscribePattern It is good in case of we don't want to pass list of topics - the spark streaming can load topic based on regex and start processing those topics. But my question is not related to loading topic based on pattern - "the question is once stream is materialized and running, I would like to add new topic on fly without restarting the job". > Dynamically update topics list for spark kafka consumer > --- > > Key: SPARK-22561 > URL: https://issues.apache.org/jira/browse/SPARK-22561 > Project: Spark > Issue Type: New Feature > Components: DStreams >Affects Versions: 2.1.0, 2.1.1, 2.2.0 >Reporter: Arun > > The Spark Streaming application should allow to add new topic after streaming > context is intialized and DStream is started. This is very useful feature > specially when business is working multi geography or multi business units. > For example initially I have spark-kakfa consumer listening for topics: > ["topic-1"."topic-2"] and after couple of days I have added new topics to > kafka ["topic-3","topic-4"], now is there a way to update spark-kafka > consumer topics list and ask spark-kafka consumer to consume data for updated > list of topics without stopping sparkStreaming application or sparkStreaming > context. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22561) Dynamically update topics list for spark kafka consumer
Arun created SPARK-22561: Summary: Dynamically update topics list for spark kafka consumer Key: SPARK-22561 URL: https://issues.apache.org/jira/browse/SPARK-22561 Project: Spark Issue Type: New Feature Components: DStreams Affects Versions: 2.2.0, 2.1.1, 2.1.0 Reporter: Arun The Spark Streaming application should allow to add new topic after streaming context is intialized and DStream is started. This is very useful feature specially when business is working multi geography or multi business units. For example initially I have spark-kakfa consumer listening for topics: ["topic-1"."topic-2"] and after couple of days I have added new topics to kafka ["topic-3","topic-4"], now is there a way to update spark-kafka consumer topics list and ask spark-kafka consumer to consume data for updated list of topics without stopping sparkStreaming application or sparkStreaming context. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603053#comment-14603053 ] Arun commented on SPARK-8409: - http://apache-spark-user-list.1001560.n3.nabble.com/How-to-row-bind-two-data-frames-in-SparkR-td23502.html This is the link I posted In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: SparkR, Windows Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603055#comment-14603055 ] Arun commented on SPARK-8409: - http://apache-spark-user-list.1001560.n3.nabble.com/Convert-R-code-into-SparkR-code-for-spark-1-4-version-td23489.html Another link I posted In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: SparkR, Windows Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json at
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603202#comment-14603202 ] Arun commented on SPARK-8409: - If possible reply on ur id, I have mailed you yesterday to shiva...@cs.berkeley.edu Thanks, Arun In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: SparkR, Windows Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602706#comment-14602706 ] Arun commented on SPARK-8409: - Hi Shivaram, 1.) Spark-csv 2.11 works fine in my home internet, but its not working in my office network, i have raised this issue with our office network admin. waiting for them to revert back. 2.) I need a favor shiva, In R we use rbind() to bind two data frames eg.) rbind(X , Y) How can we do the same in SparkR in spark 1.4. I have asked this question in spark user community mailing list, i dint get any answer. In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: SparkR, Windows Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at
[jira] [Created] (SPARK-8629) R code in SparkR
Arun created SPARK-8629: --- Summary: R code in SparkR Key: SPARK-8629 URL: https://issues.apache.org/jira/browse/SPARK-8629 Project: Spark Issue Type: Question Components: R Reporter: Arun Priority: Minor Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23 Code i used in R: data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) factors - unique(data$ItemNo) df.allitems - data.frame() for(i in 1:length(factors)) { data1 - filter(data, ItemNo == factors[[i]]) data2- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select particular columns date2$date - as.Date(date2$date, format = %m/%d/%y) # format the date data3 - data2[order(data2$date), ] # order by assending df.allitems - rbind(data3 , df.allitems) # Append by row bind } write.csv(df.allitems,E:/all_items.csv) --- I have done some SparkR code: data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF factors - distinct(df_1) # removed duplicates #for select i used: df_2 - select(distinctDF ,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select action I dont know how to: 1) create a empty sparkR DF 2) Using for loop in SparkR 3) change the date format. 4) find the lenght() in spark df 5) using rbind in sparkR can you help me out in doing the above code in sparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8629) R code in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8629: Description: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23 Code i used in R: data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) factors - unique(data$ItemNo) df.allitems - data.frame() for(i in 1:length(factors)) { data1 - filter(data, ItemNo == factors[[i]]) data2select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) date2$date - as.Date(date2$date, format = %m/%d/%y) data3 - data2[order(data2$date), ] df.allitems - rbind(data3 , df.allitems) # Append by row bind } write.csv(df.allitems,E:/all_items.csv) --- I have done some SparkR code: data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF factors - distinct(df_1) # removed duplicates #for select i used: df_2 - select(distinctDF ,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select action I dont know how to: 1) create a empty sparkR DF 2) Using for loop in SparkR 3) change the date format. 4) find the lenght() in spark df 5) using rbind in sparkR can you help me out in doing the above code in sparkR. was: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 15012 more. Value Chana Dal 1 Kg.
[jira] [Updated] (SPARK-8629) R code in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8629: Description: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23 Code i used in R: data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) factors - unique(data$ItemNo) df.allitems - data.frame() for(i in 1:length(factors)) { data1 - filter(data, ItemNo == factors[[i]]) data2- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select particular columns date2$date - as.Date(date2$date, format = %m/%d/%y) # format the date data3 - data2[order(data2$date), ] # order by assending df.allitems - rbind(data3 , df.allitems) # Append by row bind } write.csv(df.allitems,E:/all_items.csv) --- I have done some SparkR code: data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF factors - distinct(df_1) # removed duplicates #for select i used: df_2 - select(distinctDF ,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select action I dont know how to: 1) create a empty sparkR DF 2) Using for loop in SparkR 3) change the date format. 4) find the lenght() in spark df 5) using rbind in sparkR can you help me out in doing the above code in sparkR. was: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013
[jira] [Updated] (SPARK-8629) R code in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8629: Description: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23 Code i used in R: data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) factors - unique(data$ItemNo) df.allitems - data.frame() for(i in 1:length(factors)) { data1 - filter(data, ItemNo == factors[[i]]) data2select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) date2$date - as.Date(date2$date, format = %m/%d/%y) data3 - data2[order(data2$date), ] df.allitems - rbind(data3 , df.allitems) # Append by row bind } write.csv(df.allitems,E:/all_items.csv) --- I have done some SparkR code: data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF factors - distinct(df_1) # removed duplicates #for select i used: df_2 - select(distinctDF ,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select action I dont know how to: 1) create a empty sparkR DF 2) Using for loop in SparkR 3) change the date format. 4) find the lenght() in spark df 5) using rbind in sparkR can you help me out in doing the above code in sparkR. was: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 15012 more. Value Chana Dal 1 Kg.
[jira] [Updated] (SPARK-8629) R code in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8629: Description: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23 Code i used in R: data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) factors - unique(data$ItemNo) df.allitems - data.frame() for(i in 1:length(factors)) { data1 - filter(data, ItemNo == factors[[i]]) data2- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) date2$date - as.Date(date2$date, format = %m/%d/%y) data3 - data2[order(data2$date), ] df.allitems - rbind(data3 , df.allitems) # Append by row bind } write.csv(df.allitems,E:/all_items.csv) You can see the code clearly in - - http://apache-spark-user-list.1001560.n3.nabble.com/Convert-R-code-into-SparkR-code-for-spark-1-4-version-tp23489.html - I have done some SparkR code: data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF factors - distinct(df_1) # removed duplicates #for select i used: df_2 - select(distinctDF ,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select action I dont know how to: 1) create a empty sparkR DF 2) Using for loop in SparkR 3) change the date format. 4) find the lenght() in spark df 5) using rbind in sparkR can you help me out in doing the above code in sparkR. was: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg.
[jira] [Updated] (SPARK-8629) R code in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8629: Description: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23 Code i used in R: data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) factors - unique(data$ItemNo) df.allitems - data.frame() for(i in 1:length(factors)) { data1 - filter(data, ItemNo == factors[[i]]) data2select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select particular columns date2$date - as.Date(date2$date, format = %m/%d/%y) # format the date data3 - data2[order(data2$date), ] # order by assending df.allitems - rbind(data3 , df.allitems) # Append by row bind } write.csv(df.allitems,E:/all_items.csv) --- I have done some SparkR code: data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF factors - distinct(df_1) # removed duplicates #for select i used: df_2 - select(distinctDF ,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select action I dont know how to: 1) create a empty sparkR DF 2) Using for loop in SparkR 3) change the date format. 4) find the lenght() in spark df 5) using rbind in sparkR can you help me out in doing the above code in sparkR. was: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19
[jira] [Updated] (SPARK-8629) R code in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8629: Description: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18 Hyderabad 11 15012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39 Hyderabad 11 15013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23 Code i used in R: data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) factors - unique(data$ItemNo) df.allitems - data.frame() for(i in 1:length(factors)) { data1 - filter(data, ItemNo == factors[[i]]) data2- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) date2$date - as.Date(date2$date, format = %m/%d/%y) data3 - data2[order(data2$date), ] df.allitems - rbind(data3 , df.allitems) # Append by row bind } write.csv(df.allitems,E:/all_items.csv) --- I have done some SparkR code: data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF factors - distinct(df_1) # removed duplicates #for select i used: df_2 - select(distinctDF ,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select action I dont know how to: 1) create a empty sparkR DF 2) Using for loop in SparkR 3) change the date format. 4) find the lenght() in spark df 5) using rbind in sparkR can you help me out in doing the above code in sparkR. was: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month YearSalesQuantity Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 15010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/1/20132-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 15011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595716#comment-14595716 ] Arun commented on SPARK-8409: - Dear Shivram, While using spark shell, to install pacakage .\spark-shell --packages com.databricks:spark-csv_2.11:1.0.3 The csv packages got sucessfully installed. The problem is with sparkR shell only. kindly get the ways to get installed in sparkR shell. E:\spark-1.4.0-bin-hadoop2.6\bin.\spark-shell --packages com.databricks:spark-c sv_2.11:1.0.3 Ivy Default Cache set to: C:\Users\acer1\.ivy2\cache The jars for the packages stored in: C:\Users\acer1\.ivy2\jars :: loading settings :: url = jar:file:/E:/spark-1.4.0-bin-hadoop2.6/lib/spark-as sembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml com.databricks#spark-csv_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found com.databricks#spark-csv_2.11;1.0.3 in central found org.apache.commons#commons-csv;1.1 in central downloading https://repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.0.3/s park-csv_2.11-1.0.3.jar ... [SUCCESSFUL ] com.databricks#spark-csv_2.11;1.0.3!spark-csv_2.11.jar (70 5ms) downloading https://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.1/co mmons-csv-1.1.jar ... [SUCCESSFUL ] org.apache.commons#commons-csv;1.1!commons-csv.jar (479ms) :: resolution report :: resolve 11565ms :: artifacts dl 1200ms :: modules in use: com.databricks#spark-csv_2.11;1.0.3 from central in [default] org.apache.commons#commons-csv;1.1 from central in [default] - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 2 | 2 | 2 | 0 || 2 | 2 | - :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 2 artifacts copied, 0 already retrieved (90kB/63ms) log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.li b.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more in fo. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/06/22 15:33:22 INFO SecurityManager: Changing view acls to: acer1 15/06/22 15:33:22 INFO SecurityManager: Changing modify acls to: acer1 15/06/22 15:33:22 INFO SecurityManager: SecurityManager: authentication disabled ; ui acls disabled; users with view permissions: Set(acer1); users with modify p ermissions: Set(acer1) 15/06/22 15:33:22 INFO HttpServer: Starting HTTP Server 15/06/22 15:33:22 INFO Utils: Successfully started service 'HTTP class server' o n port 53987. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71) Type in expressions to have them evaluated. Type :help for more information. 15/06/22 15:33:27 INFO SparkContext: Running Spark version 1.4.0 15/06/22 15:33:27 INFO SecurityManager: Changing view acls to: acer1 15/06/22 15:33:27 INFO SecurityManager: Changing modify acls to: acer1 15/06/22 15:33:27 INFO SecurityManager: SecurityManager: authentication disabled ; ui acls disabled; users with view permissions: Set(acer1); users with modify p ermissions: Set(acer1) 15/06/22 15:33:28 INFO Slf4jLogger: Slf4jLogger started 15/06/22 15:33:28 INFO Remoting: Starting remoting 15/06/22 15:33:28 INFO Remoting: Remoting started; listening on addresses :[akka .tcp://sparkDriver@192.168.88.1:54000] 15/06/22 15:33:28 INFO Utils: Successfully started service 'sparkDriver' on port 54000. 15/06/22 15:33:28 INFO SparkEnv: Registering MapOutputTracker 15/06/22 15:33:28 INFO SparkEnv: Registering BlockManagerMaster 15/06/22 15:33:28 INFO DiskBlockManager: Created local directory at C:\Users\ace r1\AppData\Local\Temp\spark-7805dd92-cc04-44f0-9b1c-2993939f7b21\blockmgr-b7c44c e9-7ad7-4a03-b041-7a0aa491de10 15/06/22 15:33:28 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 15/06/22 15:33:28 INFO HttpFileServer: HTTP File server directory is C:\Users\ac er1\AppData\Local\Temp\spark-7805dd92-cc04-44f0-9b1c-2993939f7b21\httpd-a833b562 -71a5-400e-85e3-821f4760348c 15/06/22 15:33:28 INFO HttpServer: Starting HTTP Server 15/06/22 15:33:28 INFO Utils: Successfully started service 'HTTP file server' on port 54001. 15/06/22 15:33:28 INFO SparkEnv: Registering OutputCommitCoordinator 15/06/22 15:33:28 INFO
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595568#comment-14595568 ] Arun commented on SPARK-8409: - Hi Shivram, I tried putting those files in that destination as discussed in earlier conversation, but that dint work well. Then i tried installing the csv package in my home, private internet which is not under restrictions or proxies,but i got the following errors.Can you check in windows environment by putting this code .\sparkR --packages com.databricks:spark-csv_2.10:1.0.3 , whether the issue is in network or in windows environment. E:\spark-1.4.0-bin-hadoop2.6\bin.\sparkR --packages com.databricks:spark-csv_2. 10:1.0.3 R version 3.2.1 (2015-06-18) -- World-Famous Astronaut Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. Launching java with spark-submit command E:\spark-1.4.0-bin-hadoop2.6\bin\../bin /spark-submit.cmd --packages com.databricks:spark-csv_2.10:1.0.3 sparkr-sh ell C:\Users\acer1\AppData\Local\Temp\Rtmp0gENwW\backend_port198831cf7692 Ivy Default Cache set to: C:\Users\acer1\.ivy2\cache The jars for the packages stored in: C:\Users\acer1\.ivy2\jars :: loading settings :: url = jar:file:/E:/spark-1.4.0-bin-hadoop2.6/lib/spark-as sembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml com.databricks#spark-csv_2.10 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] You probably access the destination server through a proxy server that is not we ll configured. You probably access the destination server through a proxy server that is not we ll configured. You probably access the destination server through a proxy server that is not we ll configured. You probably access the destination server through a proxy server that is not we ll configured. :: resolution report :: resolve 23610ms :: artifacts dl 0ms :: modules in use: - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 1 | 0 | 0 | 0 || 0 | 0 | - :: problems summary :: WARNINGS Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/d atabricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/d atabricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar Host dl.bintray.com not found. url=http://dl.bintray.com/spark-packages/ maven/com/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom Host dl.bintray.com not found. url=http://dl.bintray.com/spark-packages/ maven/com/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar module not found: com.databricks#spark-csv_2.10;1.0.3 local-m2-cache: tried file:/C:/Users/acer1/.m2/repository/com/databricks/spark-csv_2.10/1.0. 3/spark-csv_2.10-1.0.3.pom -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: file:/C:/Users/acer1/.m2/repository/com/databricks/spark-csv_2.10/1.0. 3/spark-csv_2.10-1.0.3.jar local-ivy-cache: tried -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: file:/C:/Users/acer1/.ivy2/local/com.databricks\spark-csv_2.10\1.0.3\j ars\spark-csv_2.10.jar central: tried https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa rk-csv_2.10-1.0.3.pom -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa rk-csv_2.10-1.0.3.jar spark-packages: tried http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2. 10/1.0.3/spark-csv_2.10-1.0.3.pom -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2. 10/1.0.3/spark-csv_2.10-1.0.3.jar
[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595568#comment-14595568 ] Arun edited comment on SPARK-8409 at 6/22/15 10:19 AM: --- Hi Shivram, I tried putting those files in that destination as discussed in earlier conversation, but that dint work well. Then i tried installing the csv package in my home, private internet which is not under restrictions or proxies,but i got the following errors in sparkRshell. E:\spark-1.4.0-bin-hadoop2.6\bin.\sparkR --packages com.databricks:spark-csv_2. 10:1.0.3 R version 3.2.1 (2015-06-18) -- World-Famous Astronaut Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. Launching java with spark-submit command E:\spark-1.4.0-bin-hadoop2.6\bin\../bin /spark-submit.cmd --packages com.databricks:spark-csv_2.10:1.0.3 sparkr-sh ell C:\Users\acer1\AppData\Local\Temp\Rtmp0gENwW\backend_port198831cf7692 Ivy Default Cache set to: C:\Users\acer1\.ivy2\cache The jars for the packages stored in: C:\Users\acer1\.ivy2\jars :: loading settings :: url = jar:file:/E:/spark-1.4.0-bin-hadoop2.6/lib/spark-as sembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml com.databricks#spark-csv_2.10 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] You probably access the destination server through a proxy server that is not we ll configured. You probably access the destination server through a proxy server that is not we ll configured. You probably access the destination server through a proxy server that is not we ll configured. You probably access the destination server through a proxy server that is not we ll configured. :: resolution report :: resolve 23610ms :: artifacts dl 0ms :: modules in use: - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 1 | 0 | 0 | 0 || 0 | 0 | - :: problems summary :: WARNINGS Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/d atabricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/d atabricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar Host dl.bintray.com not found. url=http://dl.bintray.com/spark-packages/ maven/com/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom Host dl.bintray.com not found. url=http://dl.bintray.com/spark-packages/ maven/com/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar module not found: com.databricks#spark-csv_2.10;1.0.3 local-m2-cache: tried file:/C:/Users/acer1/.m2/repository/com/databricks/spark-csv_2.10/1.0. 3/spark-csv_2.10-1.0.3.pom -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: file:/C:/Users/acer1/.m2/repository/com/databricks/spark-csv_2.10/1.0. 3/spark-csv_2.10-1.0.3.jar local-ivy-cache: tried -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: file:/C:/Users/acer1/.ivy2/local/com.databricks\spark-csv_2.10\1.0.3\j ars\spark-csv_2.10.jar central: tried https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa rk-csv_2.10-1.0.3.pom -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa rk-csv_2.10-1.0.3.jar spark-packages: tried http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2. 10/1.0.3/spark-csv_2.10-1.0.3.pom -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2. 10/1.0.3/spark-csv_2.10-1.0.3.jar :: :: UNRESOLVED DEPENDENCIES ::
[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591723#comment-14591723 ] Arun edited comment on SPARK-8409 at 6/18/15 12:48 PM: --- Hi Shivaram, I got the below error when i did as you told, reading from hdfs for csv file, kindly make a note that the HDFS link which i have given is syntax correct. TIA df_1 - read.df(sqlContext, hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv, com.databricks.spark.csv, header=true) 15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl er.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne lInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM essageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage Decoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne lPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra ctNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav a:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve ntLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja va:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread EventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato r.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Failed to load class for data source: com .databricks.spark.csv at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl .scala:216) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230) ... 25 more Error: returnStatus == 0 is not TRUE was (Author: b.arunguna...@gmail.com): Hi Shivram, I got the below error when i did as you told, reading from hdfs for csv file, kindly make a note that the HDFS link which i have given is syntax correct. TIA df_1 - read.df(sqlContext, hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv, com.databricks.spark.csv, header=true) 15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl er.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne lInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at
[jira] [Issue Comment Deleted] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8409: Comment: was deleted (was: Hi shivaram, I got these error as when i did as you told, kindly make a note is the hdfs connection i made is correct. TIA df_1 - read.df(sqlContext, hdfs://ABRLMISDEV:8020/app.admin/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv, com.databricks.spark.csv, header=true) 15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl er.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne lInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM essageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage Decoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne lPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra ctNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav a:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve ntLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja va:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread EventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato r.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Failed to load class for data source: com .databricks.spark.csv at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl .scala:216) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230) ... 25 more Error: returnStatus == 0 is not TRUE) In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Labels: build Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added
[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591723#comment-14591723 ] Arun edited comment on SPARK-8409 at 6/18/15 12:48 PM: --- Hi Shivaram, I got the below error when i did as you told, reading from hdfs for csv file, kindly make a note that the HDFS path which i have given is syntax correct. TIA df_1 - read.df(sqlContext, hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv, com.databricks.spark.csv, header=true) 15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl er.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne lInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM essageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage Decoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne lPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra ctNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav a:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve ntLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja va:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread EventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato r.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Failed to load class for data source: com .databricks.spark.csv at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl .scala:216) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230) ... 25 more Error: returnStatus == 0 is not TRUE was (Author: b.arunguna...@gmail.com): Hi Shivaram, I got the below error when i did as you told, reading from hdfs for csv file, kindly make a note that the HDFS link which i have given is syntax correct. TIA df_1 - read.df(sqlContext, hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv, com.databricks.spark.csv, header=true) 15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl er.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne lInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591715#comment-14591715 ] Arun commented on SPARK-8409: - Hi shivaram, I got these error as when i did as you told, kindly make a note is the hdfs connection i made is correct. TIA df_1 - read.df(sqlContext, hdfs://ABRLMISDEV:8020/app.admin/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv, com.databricks.spark.csv, header=true) 15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl er.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne lInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM essageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage Decoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne lPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra ctNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav a:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve ntLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja va:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread EventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato r.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Failed to load class for data source: com .databricks.spark.csv at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl .scala:216) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230) ... 25 more Error: returnStatus == 0 is not TRUE In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Labels: build Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591782#comment-14591782 ] Arun commented on SPARK-8409: - I think the reason behind the above error is that the csv package is not downloaded or installed. I tried to install the pack separately using the following code. I there any other method so i can install the pack. .bin\sparkR --pack ages com.databricks:spark-csv_2.10:1.0.3 R version 3.2.0 (2015-04-16) -- Full of Ingredients Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. Warning: namespace 'SparkR' is not available and has been replaced by .GlobalEnv when processing object 'df' [Previously saved workspace restored] Launching java with spark-submit command E:\setup\spark-1.4.0-bin-hadoop2.6\spar k-1.4.0-bin-hadoop2.6\bin\../bin/spark-submit.cmd --packages com.databricks: spark-csv_2.10:1.0.3 sparkr-shell C:\Users\RAJESH~1.KOD\AppData\Local\Temp\6 \RtmpgTFIOz\backend_port987858e35a Ivy Default Cache set to: C:\Users\rajesh.kodam-v\.ivy2\cache The jars for the packages stored in: C:\Users\rajesh.kodam-v\.ivy2\jars :: loading settings :: url = jar:file:/E:/setup/spark-1.4.0-bin-hadoop2.6/spark- 1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/cor e/settings/ivysettings.xml com.databricks#spark-csv_2.10 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] :: resolution report :: resolve 96999ms :: artifacts dl 0ms :: modules in use: - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 1 | 0 | 0 | 0 || 0 | 0 | - :: problems summary :: WARNINGS module not found: com.databricks#spark-csv_2.10;1.0.3 local-m2-cache: tried file:/C:/Users/rajesh.kodam-v/.m2/repository/com/databricks/spark-csv_ 2.10/1.0.3/spark-csv_2.10-1.0.3.pom -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: file:/C:/Users/rajesh.kodam-v/.m2/repository/com/databricks/spark-csv_ 2.10/1.0.3/spark-csv_2.10-1.0.3.jar local-ivy-cache: tried -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: file:/C:/Users/rajesh.kodam-v/.ivy2/local/com.databricks\spark-csv_2.1 0\1.0.3\jars\spark-csv_2.10.jar central: tried https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa rk-csv_2.10-1.0.3.pom -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa rk-csv_2.10-1.0.3.jar spark-packages: tried http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2. 10/1.0.3/spark-csv_2.10-1.0.3.pom -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar: http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2. 10/1.0.3/spark-csv_2.10-1.0.3.jar :: :: UNRESOLVED DEPENDENCIES :: :: :: com.databricks#spark-csv_2.10;1.0.3: not found :: ERRORS Server access error at url https://repo1.maven.org/maven2/com/databricks /spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom (java.net.ConnectException: Conne ction timed out: connect) Server access error at url https://repo1.maven.org/maven2/com/databricks /spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar (java.net.ConnectException: Conne ction timed out: connect) Server access error at url http://dl.bintray.com/spark-packages/maven/co m/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom (java.net.ConnectExce ption: Connection timed out: connect) Server access error at url
[jira] [Reopened] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun reopened SPARK-8409: - Hi Shivram, I got the below error when i did as you told, reading from hdfs for csv file, kindly make a note that the HDFS link which i have given is syntax correct. TIA df_1 - read.df(sqlContext, hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv, com.databricks.spark.csv, header=true) 15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl er.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne lInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM essageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage Decoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne lPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra ctNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav a:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve ntLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja va:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread EventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato r.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Failed to load class for data source: com .databricks.spark.csv at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl .scala:216) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230) ... 25 more Error: returnStatus == 0 is not TRUE In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Labels: build Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added
[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591723#comment-14591723 ] Arun edited comment on SPARK-8409 at 6/18/15 1:27 PM: -- Hi Shivaram, I got the below error when i did as you told, reading from hdfs for csv file, kindly make a note that the HDFS path which i have given is syntax correct. TIA df_1 - read.df(sqlContext, hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv, com.databricks.spark.csv, header=true) 15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl er.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne lInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM essageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage Decoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra ctChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne lPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra ctNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav a:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve ntLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja va:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread EventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato r.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Failed to load class for data source: com .databricks.spark.csv at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl .scala:216) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230) ... 25 more Error: returnStatus == 0 is not TRUE was (Author: b.arunguna...@gmail.com): Hi Shivaram, I got the below error when i did as you told, reading from hdfs for csv file, kindly make a note that the HDFS path which i have given is syntax correct. TIA df_1 - read.df(sqlContext, hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv, com.databricks.spark.csv, header=true) 15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl er.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s cala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne lInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst ractChannelHandlerContext.java:333) at
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592050#comment-14592050 ] Arun commented on SPARK-8409: - Am using windows machine and I have downloaded the :spark-csv_2.10:1.0.3.jar. If i place this jar file in spark lib folder, will it work properly. In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Labels: build Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json at
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592067#comment-14592067 ] Arun commented on SPARK-8409: - Ok Shivaram I will try out tomorrow and let u know. Thanks In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Labels: build Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590119#comment-14590119 ] Arun commented on SPARK-8409: - Ok thanks a lot shriram In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Labels: build Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
[jira] [Updated] (SPARK-8409) In windows cant able to read .csv or .json files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8409: Summary: In windows cant able to read .csv or .json files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) (was: In windows cant able to read .csv files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv)) In windows cant able to read .csv or .json files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Labels: build Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at
[jira] [Updated] (SPARK-8409) In windows cant able to read .csv or .json files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8409: Description: Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
[jira] [Created] (SPARK-8409) i cant able to read .csv files using read.df() in sparkR of spark 1.4 for eg.) mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false)
Arun created SPARK-8409: --- Summary: i cant able to read .csv files using read.df() in sparkR of spark 1.4 for eg.) mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at
[jira] [Updated] (SPARK-8409) In windows cant able to read .csv files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-ha
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8409: Summary: In windows cant able to read .csv files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) (was: i cant able to read .csv files using read.df() in sparkR of spark 1.4 for eg.) mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) ) In windows cant able to read .csv files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Labels: build Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at
[jira] [Updated] (SPARK-8409) In windows cant able to read .csv or .json files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated SPARK-8409: Description: Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()
[ https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590100#comment-14590100 ] Arun commented on SPARK-8409: - Thanks sivram, I have doubts to get clarified. 1.) how can we increase the space of the spark standalone cluster, by default its having around 234mb only. Am using spark 1.4 sparkR in windows environment. When I browsed it told to edit on spark env.sh , but what code I have to put on. 2.) I have an hoton works standalone cluster (10.200.202.85:8020), can we make it as master node in spark content. If yes what's the code in windows env. 3.) Or keeping spark cluster as master can we connect Hortonworks standalone cluster as worker node. If possible what's the code in win environment. If am wrong kidly regret. TIA, Arun Gunalan In windows cant able to read .csv or .json files using read.df() - Key: SPARK-8409 URL: https://issues.apache.org/jira/browse/SPARK-8409 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Environment: sparkR API Reporter: Arun Priority: Critical Labels: build Hi, In SparkR shell, I invoke: mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false) I have tried various filetypes (csv, txt), all fail. in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv, source = csv) RESPONSE: ERROR RBackendHandler: load on 1 failed BELOW THE WHOLE RESPONSE: 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with curMem=0, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.4 KB, free 265.2 MB) 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with curMem=177600, maxMem=278302556 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.2 KB, free 265.2 MB) 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at NativeMethodAccessorImpl.java:-2 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at