[jira] [Commented] (SPARK-12378) CREATE EXTERNAL TABLE AS SELECT EXPORT AWS S3 ERROR

2018-02-09 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358393#comment-16358393
 ] 

Arun commented on SPARK-12378:
--

I am also getting the same issue when I am trying to insert data in hive from 
spark.

My table is an external table stores in AWS S3.

Although the data gets inserted in the table, but it gives this message:

 
{code:java}
-chgrp: '' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
18/02/09 13:25:56 ERROR KeyProviderCache: Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a keyProvider !!
-chgrp: '' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...{code}
Any resolution please?

> CREATE EXTERNAL TABLE AS SELECT EXPORT AWS S3 ERROR
> ---
>
> Key: SPARK-12378
> URL: https://issues.apache.org/jira/browse/SPARK-12378
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
> Environment: AWS EMR 4.2.0
> Just Master Running m3.xlarge
> Applications:
> Hive 1.0.0
> Spark 1.5.2
>Reporter: CESAR MICHELETTI
>Priority: Major
>
> I am receive the bellow error during try exporting data to AWS S3, in 
> spark-sql.
> Command:
> CREATE external TABLE export 
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
> -- lines terminated by '\n' 
>  STORED AS TEXTFILE
>  LOCATION 's3://xxx/yyy'
>  AS
> SELECT 
> xxx
> 
> (complete query)
> ;
> Error:
> -chgrp: '' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: '' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> 15/12/16 21:09:25 ERROR SparkSQLDriver: Failed in [CREATE external TABLE 
> csvexport
> ...
> (create table + query)
> ...
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:441)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:489)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:488)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:263)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
> at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
> at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
> at 
> org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:89)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> at 
> 

[jira] [Reopened] (SPARK-22561) Dynamically update topics list for spark kafka consumer

2017-11-23 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun reopened SPARK-22561:
--

Thanks [~c...@koeninger.org] - The SubscribePattern allows you to use a regex 
to specify topics of interest. Note that unlike the 0.8 integration, using 
Subscribe or SubscribePattern should respond to adding partitions during a 
running stream. 

I tested the SubscribePattern It is good in case of we don't want to pass list 
of topics - the spark streaming can load topic based on regex and start 
processing those topics. 

But my question is not related to loading topic based on pattern - "the 
question is once stream is materialized and running, I would like to add new 
topic on fly without restarting the job".  
 

> Dynamically update topics list for spark kafka consumer
> ---
>
> Key: SPARK-22561
> URL: https://issues.apache.org/jira/browse/SPARK-22561
> Project: Spark
>  Issue Type: New Feature
>  Components: DStreams
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Arun
>
> The Spark Streaming application should allow to add new topic after streaming 
> context is intialized and DStream is started.  This is very useful feature 
> specially when business is working multi geography or  multi business units. 
> For example initially I have spark-kakfa consumer listening for topics: 
> ["topic-1"."topic-2"] and after couple of days I have added new topics to 
> kafka ["topic-3","topic-4"], now is there a way to update spark-kafka 
> consumer topics list and ask spark-kafka consumer to consume data for updated 
> list of topics without stopping sparkStreaming application or sparkStreaming 
> context.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22561) Dynamically update topics list for spark kafka consumer

2017-11-20 Thread Arun (JIRA)
Arun created SPARK-22561:


 Summary: Dynamically update topics list for spark kafka consumer
 Key: SPARK-22561
 URL: https://issues.apache.org/jira/browse/SPARK-22561
 Project: Spark
  Issue Type: New Feature
  Components: DStreams
Affects Versions: 2.2.0, 2.1.1, 2.1.0
Reporter: Arun


The Spark Streaming application should allow to add new topic after streaming 
context is intialized and DStream is started.  This is very useful feature 
specially when business is working multi geography or  multi business units. 

For example initially I have spark-kakfa consumer listening for topics: 
["topic-1"."topic-2"] and after couple of days I have added new topics to kafka 
["topic-3","topic-4"], now is there a way to update spark-kafka consumer topics 
list and ask spark-kafka consumer to consume data for updated list of topics 
without stopping sparkStreaming application or sparkStreaming context.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-26 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603053#comment-14603053
 ] 

Arun commented on SPARK-8409:
-

http://apache-spark-user-list.1001560.n3.nabble.com/How-to-row-bind-two-data-frames-in-SparkR-td23502.html
  
 This is the link I posted

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: SparkR, Windows
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  
 at 
 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
  
 at java.lang.Thread.run(Thread.java:745) 
 Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
 not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
 at 
 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
  
 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-26 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603055#comment-14603055
 ] 

Arun commented on SPARK-8409:
-

http://apache-spark-user-list.1001560.n3.nabble.com/Convert-R-code-into-SparkR-code-for-spark-1-4-version-td23489.html

Another link I posted

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: SparkR, Windows
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  
 at 
 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
  
 at java.lang.Thread.run(Thread.java:745) 
 Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
 not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
 at 
 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-26 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603202#comment-14603202
 ] 

Arun commented on SPARK-8409:
-

If possible reply on ur id, I have mailed you yesterday to 
shiva...@cs.berkeley.edu

Thanks,
Arun 

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: SparkR, Windows
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  
 at 
 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
  
 at java.lang.Thread.run(Thread.java:745) 
 Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
 not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
 at 
 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
  
 at 
 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-26 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602706#comment-14602706
 ] 

Arun commented on SPARK-8409:
-

Hi Shivaram,
1.) Spark-csv 2.11 works fine in my home internet, but its not working in my 
office network, i have raised this issue with our office network admin. waiting 
for them to revert back.
2.) I need a favor shiva, In R we use rbind() to bind two data frames eg.) 
rbind(X , Y) 
How can we do the same in SparkR in spark 1.4. I have asked this question in 
spark user community mailing list, i dint get any answer.


  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: SparkR, Windows
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  
 at 
 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
  
 at 

[jira] [Created] (SPARK-8629) R code in SparkR

2015-06-25 Thread Arun (JIRA)
Arun created SPARK-8629:
---

 Summary: R code in SparkR
 Key: SPARK-8629
 URL: https://issues.apache.org/jira/browse/SPARK-8629
 Project: Spark
  Issue Type: Question
  Components: R
Reporter: Arun
Priority: Minor


Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/18/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/19/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/20/2013   2-Feb   2013 16 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/21/2013   2-Feb   2013 25 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/22/2013   2-Feb   2013 19 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/23/2013   2-Feb   2013 17 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/24/2013   2-Feb   2013 39 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/25/2013   2-Feb   2013 23 


Code i used in R:

  data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) 
  factors - unique(data$ItemNo) 
  df.allitems - data.frame() 
  for(i in 1:length(factors)) 
  { 
   data1 - filter(data, ItemNo  == factors[[i]]) 
   data2- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) 
# select particular columns 
   date2$date - as.Date(date2$date, format = %m/%d/%y) # format the date 
   data3 - data2[order(data2$date), ] # order by assending 
   df.allitems - rbind(data3 , df.allitems)  # Append by row bind 
  } 
  
  write.csv(df.allitems,E:/all_items.csv) 

--- 
  
I have done some SparkR code: 
  data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R 
  df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors - distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 - select(distinctDF 
,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select 
action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8629) R code in SparkR

2015-06-25 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8629:

Description: 
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/18/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/19/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/20/2013   2-Feb   2013 16 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/21/2013   2-Feb   2013 25 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/22/2013   2-Feb   2013 19 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/23/2013   2-Feb   2013 17 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/24/2013   2-Feb   2013 39 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/25/2013   2-Feb   2013 23 


Code i used in R:

  data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) 
 factors - unique(data$ItemNo) 
  df.allitems - data.frame() 
  for(i in 1:length(factors)) 
 
 { 
   data1 - filter(data, ItemNo  == factors[[i]]) 
 data2select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) 
 date2$date - as.Date(date2$date, format = %m/%d/%y)  
 data3 - data2[order(data2$date), ]  
 df.allitems - rbind(data3 , df.allitems)  # Append by row bind 
  } 

  
  write.csv(df.allitems,E:/all_items.csv) 

--- 
  
I have done some SparkR code: 
  data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R 
  df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors - distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 - select(distinctDF 
,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select 
action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.


  was:
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg.   

[jira] [Updated] (SPARK-8629) R code in SparkR

2015-06-25 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8629:

Description: 
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/18/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/19/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/20/2013   2-Feb   2013 16 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/21/2013   2-Feb   2013 25 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/22/2013   2-Feb   2013 19 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/23/2013   2-Feb   2013 17 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/24/2013   2-Feb   2013 39 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/25/2013   2-Feb   2013 23 


Code i used in R:

  data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) 
  factors - unique(data$ItemNo) 
  df.allitems - data.frame() 
  for(i in 1:length(factors)) 
  { 
   data1 - filter(data, ItemNo  == factors[[i]]) 
   
  data2- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) 
# select particular columns  
   date2$date - as.Date(date2$date, format = %m/%d/%y) # format the date 
   
   data3 - data2[order(data2$date), ] # order by assending 
   df.allitems - rbind(data3 , df.allitems)  # Append by row bind 
  } 
  
  write.csv(df.allitems,E:/all_items.csv) 

--- 
  
I have done some SparkR code: 
  data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R 
  df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors - distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 - select(distinctDF 
,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select 
action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.


  was:
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   

[jira] [Updated] (SPARK-8629) R code in SparkR

2015-06-25 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8629:

Description: 
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/18/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/19/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/20/2013   2-Feb   2013 16 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/21/2013   2-Feb   2013 25 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/22/2013   2-Feb   2013 19 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/23/2013   2-Feb   2013 17 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/24/2013   2-Feb   2013 39 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/25/2013   2-Feb   2013 23 


Code i used in R:

  data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) 
 factors - unique(data$ItemNo) 
  df.allitems - data.frame() 
  for(i in 1:length(factors)) 
  { 
   data1 - filter(data, ItemNo  == factors[[i]]) 
 data2select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) 
 date2$date - as.Date(date2$date, format = %m/%d/%y)  
 data3 - data2[order(data2$date), ]  
 df.allitems - rbind(data3 , df.allitems)  # Append by row bind 
  } 
  
  write.csv(df.allitems,E:/all_items.csv) 

--- 
  
I have done some SparkR code: 
  data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R 
  df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors - distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 - select(distinctDF 
,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select 
action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.


  was:
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 

[jira] [Updated] (SPARK-8629) R code in SparkR

2015-06-25 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8629:

Description: 
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/18/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/19/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/20/2013   2-Feb   2013 16 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/21/2013   2-Feb   2013 25 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/22/2013   2-Feb   2013 19 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/23/2013   2-Feb   2013 17 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/24/2013   2-Feb   2013 39 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/25/2013   2-Feb   2013 23 


Code i used in R:

  data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) 
 factors - unique(data$ItemNo) 
  df.allitems - data.frame() 
  for(i in 1:length(factors)) 
 
 {  

   data1 -  filter(data, ItemNo  == factors[[i]]) 
 data2- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) 
 date2$date - as.Date(date2$date, format = %m/%d/%y)  
 data3 - data2[order(data2$date), ]  
 df.allitems - rbind(data3 , df.allitems)  # Append by row bind 
  } 
 

  
  write.csv(df.allitems,E:/all_items.csv) 

You can see the code clearly in -
-
http://apache-spark-user-list.1001560.n3.nabble.com/Convert-R-code-into-SparkR-code-for-spark-1-4-version-tp23489.html
-
  
I have done some SparkR code: 
  data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R 
  df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors - distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 - select(distinctDF 
,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select 
action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.


  was:
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 

[jira] [Updated] (SPARK-8629) R code in SparkR

2015-06-25 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8629:

Description: 
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/18/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/19/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/20/2013   2-Feb   2013 16 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/21/2013   2-Feb   2013 25 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/22/2013   2-Feb   2013 19 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/23/2013   2-Feb   2013 17 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/24/2013   2-Feb   2013 39 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/25/2013   2-Feb   2013 23 


Code i used in R:

  data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) 
 factors - unique(data$ItemNo) 
  df.allitems - data.frame() 
  for(i in 1:length(factors)) 
  { 
   data1 - filter(data, ItemNo  == factors[[i]]) 
 data2select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # 
select particular columns 
 date2$date - as.Date(date2$date, format = %m/%d/%y) # format the date 
 data3 - data2[order(data2$date), ] # order by assending 
 df.allitems - rbind(data3 , df.allitems)  # Append by row bind 
  } 
  
  write.csv(df.allitems,E:/all_items.csv) 

--- 
  
I have done some SparkR code: 
  data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R 
  df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors - distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 - select(distinctDF 
,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select 
action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.


  was:
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 

[jira] [Updated] (SPARK-8629) R code in SparkR

2015-06-25 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8629:

Description: 
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/16/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/17/2013   2-Feb   2013 19 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/18/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/19/2013   2-Feb   2013 18 
Hyderabad   11  15012   more. Value Chana Dal 1 Kg. 
2/20/2013   2-Feb   2013 16 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/21/2013   2-Feb   2013 25 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/22/2013   2-Feb   2013 19 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/23/2013   2-Feb   2013 17 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/24/2013   2-Feb   2013 39 
Hyderabad   11  15013   more. Value Chana Dal 1 Kg. 
2/25/2013   2-Feb   2013 23 


Code i used in R:

  data - read.csv(D:/R/Data_sale_quantity.csv ,stringsAsFactors=FALSE) 
 factors - unique(data$ItemNo) 
  df.allitems - data.frame() 
  for(i in 1:length(factors)) 
 
 {  

   data1 -  filter(data, ItemNo  == factors[[i]]) 
 data2- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) 
 date2$date - as.Date(date2$date, format = %m/%d/%y)  
 data3 - data2[order(data2$date), ]  
 df.allitems - rbind(data3 , df.allitems)  # Append by row bind 
  } 
 

  
  write.csv(df.allitems,E:/all_items.csv) 

--- 
  
I have done some SparkR code: 
  data1 - read.csv(D:/Data_sale_quantity_mini.csv) # read in R 
  df_1 - createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors - distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 - select(distinctDF 
,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select 
action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.


  was:
Data set:  
  
DC_City Dc_Code ItemNo  Itemdescription dat   
Month YearSalesQuantity 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
9/16/2012   9-Sep 2012   1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
12/21/2012  12-Dec2012 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/12/2013   1-Jan   2013 1 
Hyderabad   11  15010   more. Value Chana Dal 1 Kg. 
1/27/2013   1-Jan   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/1/20132-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/12/2013   2-Feb   2013 3 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/13/2013   2-Feb   2013 2 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/14/2013   2-Feb   2013 1 
Hyderabad   11  15011   more. Value Chana Dal 1 Kg. 
2/15/2013   2-Feb   2013 8 
Hyderabad  

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-22 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595716#comment-14595716
 ] 

Arun commented on SPARK-8409:
-

Dear Shivram,

While using spark shell, to install pacakage .\spark-shell --packages 
com.databricks:spark-csv_2.11:1.0.3 

The csv packages got sucessfully installed.

The problem is with sparkR shell only. kindly get the ways to get installed in 
sparkR shell.

E:\spark-1.4.0-bin-hadoop2.6\bin.\spark-shell --packages com.databricks:spark-c
sv_2.11:1.0.3
Ivy Default Cache set to: C:\Users\acer1\.ivy2\cache
The jars for the packages stored in: C:\Users\acer1\.ivy2\jars
:: loading settings :: url = jar:file:/E:/spark-1.4.0-bin-hadoop2.6/lib/spark-as
sembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.11;1.0.3 in central
found org.apache.commons#commons-csv;1.1 in central
downloading https://repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.0.3/s
park-csv_2.11-1.0.3.jar ...
[SUCCESSFUL ] com.databricks#spark-csv_2.11;1.0.3!spark-csv_2.11.jar (70
5ms)
downloading https://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.1/co
mmons-csv-1.1.jar ...
[SUCCESSFUL ] org.apache.commons#commons-csv;1.1!commons-csv.jar (479ms)

:: resolution report :: resolve 11565ms :: artifacts dl 1200ms
:: modules in use:
com.databricks#spark-csv_2.11;1.0.3 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   2   |   2   |   2   |   0   ||   2   |   2   |
-
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
2 artifacts copied, 0 already retrieved (90kB/63ms)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.li
b.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more in
fo.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/22 15:33:22 INFO SecurityManager: Changing view acls to: acer1
15/06/22 15:33:22 INFO SecurityManager: Changing modify acls to: acer1
15/06/22 15:33:22 INFO SecurityManager: SecurityManager: authentication disabled
; ui acls disabled; users with view permissions: Set(acer1); users with modify p
ermissions: Set(acer1)
15/06/22 15:33:22 INFO HttpServer: Starting HTTP Server
15/06/22 15:33:22 INFO Utils: Successfully started service 'HTTP class server' o
n port 53987.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
  /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
15/06/22 15:33:27 INFO SparkContext: Running Spark version 1.4.0
15/06/22 15:33:27 INFO SecurityManager: Changing view acls to: acer1
15/06/22 15:33:27 INFO SecurityManager: Changing modify acls to: acer1
15/06/22 15:33:27 INFO SecurityManager: SecurityManager: authentication disabled
; ui acls disabled; users with view permissions: Set(acer1); users with modify p
ermissions: Set(acer1)
15/06/22 15:33:28 INFO Slf4jLogger: Slf4jLogger started
15/06/22 15:33:28 INFO Remoting: Starting remoting
15/06/22 15:33:28 INFO Remoting: Remoting started; listening on addresses :[akka
.tcp://sparkDriver@192.168.88.1:54000]
15/06/22 15:33:28 INFO Utils: Successfully started service 'sparkDriver' on port
 54000.
15/06/22 15:33:28 INFO SparkEnv: Registering MapOutputTracker
15/06/22 15:33:28 INFO SparkEnv: Registering BlockManagerMaster
15/06/22 15:33:28 INFO DiskBlockManager: Created local directory at C:\Users\ace
r1\AppData\Local\Temp\spark-7805dd92-cc04-44f0-9b1c-2993939f7b21\blockmgr-b7c44c
e9-7ad7-4a03-b041-7a0aa491de10
15/06/22 15:33:28 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/06/22 15:33:28 INFO HttpFileServer: HTTP File server directory is C:\Users\ac
er1\AppData\Local\Temp\spark-7805dd92-cc04-44f0-9b1c-2993939f7b21\httpd-a833b562
-71a5-400e-85e3-821f4760348c
15/06/22 15:33:28 INFO HttpServer: Starting HTTP Server
15/06/22 15:33:28 INFO Utils: Successfully started service 'HTTP file server' on
 port 54001.
15/06/22 15:33:28 INFO SparkEnv: Registering OutputCommitCoordinator
15/06/22 15:33:28 INFO 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-22 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595568#comment-14595568
 ] 

Arun commented on SPARK-8409:
-

Hi Shivram,

I tried putting those files in that destination as discussed in earlier 
conversation, but that dint work well.

Then i tried installing the csv package in my home, private internet which is 
not under restrictions or proxies,but i got the following errors.Can you check 
in windows environment by putting this code .\sparkR --packages 
com.databricks:spark-csv_2.10:1.0.3 , whether the issue is in network or in 
windows environment.

E:\spark-1.4.0-bin-hadoop2.6\bin.\sparkR --packages com.databricks:spark-csv_2.
10:1.0.3

R version 3.2.1 (2015-06-18) -- World-Famous Astronaut
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Launching java with spark-submit command E:\spark-1.4.0-bin-hadoop2.6\bin\../bin
/spark-submit.cmd  --packages com.databricks:spark-csv_2.10:1.0.3 sparkr-sh
ell  C:\Users\acer1\AppData\Local\Temp\Rtmp0gENwW\backend_port198831cf7692
Ivy Default Cache set to: C:\Users\acer1\.ivy2\cache
The jars for the packages stored in: C:\Users\acer1\.ivy2\jars
:: loading settings :: url = jar:file:/E:/spark-1.4.0-bin-hadoop2.6/lib/spark-as
sembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
You probably access the destination server through a proxy server that is not we
ll configured.
You probably access the destination server through a proxy server that is not we
ll configured.
You probably access the destination server through a proxy server that is not we
ll configured.
You probably access the destination server through a proxy server that is not we
ll configured.
:: resolution report :: resolve 23610ms :: artifacts dl 0ms
:: modules in use:
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   1   |   0   |   0   |   0   ||   0   |   0   |
-

:: problems summary ::
 WARNINGS
Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/d
atabricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom

Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/d
atabricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar

Host dl.bintray.com not found. url=http://dl.bintray.com/spark-packages/
maven/com/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom

Host dl.bintray.com not found. url=http://dl.bintray.com/spark-packages/
maven/com/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar

module not found: com.databricks#spark-csv_2.10;1.0.3

 local-m2-cache: tried

  file:/C:/Users/acer1/.m2/repository/com/databricks/spark-csv_2.10/1.0.
3/spark-csv_2.10-1.0.3.pom

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  file:/C:/Users/acer1/.m2/repository/com/databricks/spark-csv_2.10/1.0.
3/spark-csv_2.10-1.0.3.jar

 local-ivy-cache: tried

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  file:/C:/Users/acer1/.ivy2/local/com.databricks\spark-csv_2.10\1.0.3\j
ars\spark-csv_2.10.jar

 central: tried

  https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa
rk-csv_2.10-1.0.3.pom

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa
rk-csv_2.10-1.0.3.jar

 spark-packages: tried

  http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.
10/1.0.3/spark-csv_2.10-1.0.3.pom

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.
10/1.0.3/spark-csv_2.10-1.0.3.jar


[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-22 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595568#comment-14595568
 ] 

Arun edited comment on SPARK-8409 at 6/22/15 10:19 AM:
---

Hi Shivram,

I tried putting those files in that destination as discussed in earlier 
conversation, but that dint work well.

Then i tried installing the csv package in my home, private internet which is 
not under restrictions or proxies,but i got the following errors in sparkRshell.

E:\spark-1.4.0-bin-hadoop2.6\bin.\sparkR --packages com.databricks:spark-csv_2.
10:1.0.3

R version 3.2.1 (2015-06-18) -- World-Famous Astronaut
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Launching java with spark-submit command E:\spark-1.4.0-bin-hadoop2.6\bin\../bin
/spark-submit.cmd  --packages com.databricks:spark-csv_2.10:1.0.3 sparkr-sh
ell  C:\Users\acer1\AppData\Local\Temp\Rtmp0gENwW\backend_port198831cf7692
Ivy Default Cache set to: C:\Users\acer1\.ivy2\cache
The jars for the packages stored in: C:\Users\acer1\.ivy2\jars
:: loading settings :: url = jar:file:/E:/spark-1.4.0-bin-hadoop2.6/lib/spark-as
sembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
You probably access the destination server through a proxy server that is not we
ll configured.
You probably access the destination server through a proxy server that is not we
ll configured.
You probably access the destination server through a proxy server that is not we
ll configured.
You probably access the destination server through a proxy server that is not we
ll configured.
:: resolution report :: resolve 23610ms :: artifacts dl 0ms
:: modules in use:
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   1   |   0   |   0   |   0   ||   0   |   0   |
-

:: problems summary ::
 WARNINGS
Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/d
atabricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom

Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/d
atabricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar

Host dl.bintray.com not found. url=http://dl.bintray.com/spark-packages/
maven/com/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom

Host dl.bintray.com not found. url=http://dl.bintray.com/spark-packages/
maven/com/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar

module not found: com.databricks#spark-csv_2.10;1.0.3

 local-m2-cache: tried

  file:/C:/Users/acer1/.m2/repository/com/databricks/spark-csv_2.10/1.0.
3/spark-csv_2.10-1.0.3.pom

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  file:/C:/Users/acer1/.m2/repository/com/databricks/spark-csv_2.10/1.0.
3/spark-csv_2.10-1.0.3.jar

 local-ivy-cache: tried

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  file:/C:/Users/acer1/.ivy2/local/com.databricks\spark-csv_2.10\1.0.3\j
ars\spark-csv_2.10.jar

 central: tried

  https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa
rk-csv_2.10-1.0.3.pom

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa
rk-csv_2.10-1.0.3.jar

 spark-packages: tried

  http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.
10/1.0.3/spark-csv_2.10-1.0.3.pom

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.
10/1.0.3/spark-csv_2.10-1.0.3.jar

::

::  UNRESOLVED DEPENDENCIES ::


[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-18 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591723#comment-14591723
 ] 

Arun edited comment on SPARK-8409 at 6/18/15 12:48 PM:
---

 Hi Shivaram,

I got the below error when i did as you told, reading from hdfs for csv file, 
kindly make a note that the HDFS link which i have given is syntax correct.

TIA
df_1 - read.df(sqlContext, 
hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv,
 com.databricks.spark.csv, header=true)
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM
essageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage
Decoder.java:163)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne
lPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to load class for data source: com
.databricks.spark.csv
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl
.scala:216)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)

at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230)
... 25 more
Error: returnStatus == 0 is not TRUE


was (Author: b.arunguna...@gmail.com):
 Hi Shivram,

I got the below error when i did as you told, reading from hdfs for csv file, 
kindly make a note that the HDFS link which i have given is syntax correct.

TIA
df_1 - read.df(sqlContext, 
hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv,
 com.databricks.spark.csv, header=true)
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at 

[jira] [Issue Comment Deleted] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-18 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8409:

Comment: was deleted

(was: Hi shivaram,

I got these error as when i did as you told, kindly make a note is the hdfs 
connection i made is correct.

TIA
 
df_1 - read.df(sqlContext, 
hdfs://ABRLMISDEV:8020/app.admin/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv,
 com.databricks.spark.csv, header=true)
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM
essageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage
Decoder.java:163)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne
lPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to load class for data source: com
.databricks.spark.csv
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl
.scala:216)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)

at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230)
... 25 more
Error: returnStatus == 0 is not TRUE)

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical
  Labels: build

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added 

[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-18 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591723#comment-14591723
 ] 

Arun edited comment on SPARK-8409 at 6/18/15 12:48 PM:
---

 Hi Shivaram,

I got the below error when i did as you told, reading from hdfs for csv file, 
kindly make a note that the HDFS path which i have given is syntax correct.

TIA
df_1 - read.df(sqlContext, 
hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv,
 com.databricks.spark.csv, header=true)
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM
essageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage
Decoder.java:163)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne
lPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to load class for data source: com
.databricks.spark.csv
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl
.scala:216)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)

at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230)
... 25 more
Error: returnStatus == 0 is not TRUE


was (Author: b.arunguna...@gmail.com):
 Hi Shivaram,

I got the below error when i did as you told, reading from hdfs for csv file, 
kindly make a note that the HDFS link which i have given is syntax correct.

TIA
df_1 - read.df(sqlContext, 
hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv,
 com.databricks.spark.csv, header=true)
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-18 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591715#comment-14591715
 ] 

Arun commented on SPARK-8409:
-

Hi shivaram,

I got these error as when i did as you told, kindly make a note is the hdfs 
connection i made is correct.

TIA
 
df_1 - read.df(sqlContext, 
hdfs://ABRLMISDEV:8020/app.admin/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv,
 com.databricks.spark.csv, header=true)
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM
essageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage
Decoder.java:163)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne
lPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to load class for data source: com
.databricks.spark.csv
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl
.scala:216)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)

at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230)
... 25 more
Error: returnStatus == 0 is not TRUE

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical
  Labels: build

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-18 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591782#comment-14591782
 ] 

Arun commented on SPARK-8409:
-

I think the reason behind the above error is that the csv package is not 
downloaded or installed. I tried to install the pack separately using the 
following code. I there any other method so i can install the pack.
 
.bin\sparkR --pack
ages com.databricks:spark-csv_2.10:1.0.3

R version 3.2.0 (2015-04-16) -- Full of Ingredients
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Warning: namespace 'SparkR' is not available and has been replaced
by .GlobalEnv when processing object 'df'
[Previously saved workspace restored]

Launching java with spark-submit command E:\setup\spark-1.4.0-bin-hadoop2.6\spar
k-1.4.0-bin-hadoop2.6\bin\../bin/spark-submit.cmd  --packages com.databricks:
spark-csv_2.10:1.0.3 sparkr-shell  C:\Users\RAJESH~1.KOD\AppData\Local\Temp\6
\RtmpgTFIOz\backend_port987858e35a
Ivy Default Cache set to: C:\Users\rajesh.kodam-v\.ivy2\cache
The jars for the packages stored in: C:\Users\rajesh.kodam-v\.ivy2\jars
:: loading settings :: url = jar:file:/E:/setup/spark-1.4.0-bin-hadoop2.6/spark-
1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/cor
e/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
:: resolution report :: resolve 96999ms :: artifacts dl 0ms
:: modules in use:
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   1   |   0   |   0   |   0   ||   0   |   0   |
-

:: problems summary ::
 WARNINGS
module not found: com.databricks#spark-csv_2.10;1.0.3

 local-m2-cache: tried

  file:/C:/Users/rajesh.kodam-v/.m2/repository/com/databricks/spark-csv_
2.10/1.0.3/spark-csv_2.10-1.0.3.pom

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  file:/C:/Users/rajesh.kodam-v/.m2/repository/com/databricks/spark-csv_
2.10/1.0.3/spark-csv_2.10-1.0.3.jar

 local-ivy-cache: tried

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  file:/C:/Users/rajesh.kodam-v/.ivy2/local/com.databricks\spark-csv_2.1
0\1.0.3\jars\spark-csv_2.10.jar

 central: tried

  https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa
rk-csv_2.10-1.0.3.pom

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.0.3/spa
rk-csv_2.10-1.0.3.jar

 spark-packages: tried

  http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.
10/1.0.3/spark-csv_2.10-1.0.3.pom

  -- artifact com.databricks#spark-csv_2.10;1.0.3!spark-csv_2.10.jar:

  http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.
10/1.0.3/spark-csv_2.10-1.0.3.jar

::

::  UNRESOLVED DEPENDENCIES ::

::

:: com.databricks#spark-csv_2.10;1.0.3: not found

::


 ERRORS
Server access error at url https://repo1.maven.org/maven2/com/databricks
/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom (java.net.ConnectException: Conne
ction timed out: connect)

Server access error at url https://repo1.maven.org/maven2/com/databricks
/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.jar (java.net.ConnectException: Conne
ction timed out: connect)

Server access error at url http://dl.bintray.com/spark-packages/maven/co
m/databricks/spark-csv_2.10/1.0.3/spark-csv_2.10-1.0.3.pom (java.net.ConnectExce
ption: Connection timed out: connect)

Server access error at url 

[jira] [Reopened] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-18 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun reopened SPARK-8409:
-

 Hi Shivram,

I got the below error when i did as you told, reading from hdfs for csv file, 
kindly make a note that the HDFS link which i have given is syntax correct.

TIA
df_1 - read.df(sqlContext, 
hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv,
 com.databricks.spark.csv, header=true)
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM
essageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage
Decoder.java:163)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne
lPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to load class for data source: com
.databricks.spark.csv
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl
.scala:216)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)

at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230)
... 25 more
Error: returnStatus == 0 is not TRUE

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical
  Labels: build

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added 

[jira] [Comment Edited] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-18 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591723#comment-14591723
 ] 

Arun edited comment on SPARK-8409 at 6/18/15 1:27 PM:
--

 Hi Shivaram,

I got the below error when i did as you told, reading from hdfs for csv file, 
kindly make a note that the HDFS path which i have given is syntax correct.

TIA
df_1 - read.df(sqlContext, 
hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv,
 com.databricks.spark.csv, header=true)
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToM
essageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessage
Decoder.java:163)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abstra
ctChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne
lPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEve
ntLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread
EventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorato
r.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to load class for data source: com
.databricks.spark.csv
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl
.scala:216)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)

at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230)
... 25 more
Error: returnStatus == 0 is not TRUE


was (Author: b.arunguna...@gmail.com):
 Hi Shivaram,

I got the below error when i did as you told, reading from hdfs for csv file, 
kindly make a note that the HDFS path which i have given is syntax correct.

TIA
df_1 - read.df(sqlContext, 
hdfs://ABRLMISDEV:8020/sparkR/Data_sale_quantity_Cleaned_Missing_dates.csv,
 com.databricks.spark.csv, header=true)
15/06/18 17:55:53 ERROR RBackendHandler: load on 1 failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandl
er.scala:127)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:74)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.s
cala:36)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChanne
lInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Abst
ractChannelHandlerContext.java:333)
at 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-18 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592050#comment-14592050
 ] 

Arun commented on SPARK-8409:
-

Am using windows machine and I have downloaded the :spark-csv_2.10:1.0.3.jar. 
If i place this jar file in spark lib folder, will it work properly.

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical
  Labels: build

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  
 at 
 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
  
 at java.lang.Thread.run(Thread.java:745) 
 Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
 not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
 at 
 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-18 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592067#comment-14592067
 ] 

Arun commented on SPARK-8409:
-

Ok Shivaram I will try out tomorrow and let u know. Thanks 

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical
  Labels: build

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  
 at 
 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
  
 at java.lang.Thread.run(Thread.java:745) 
 Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
 not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
 at 
 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
  
 at 
 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-17 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590119#comment-14590119
 ] 

Arun commented on SPARK-8409:
-

Ok thanks a lot shriram

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical
  Labels: build

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  
 at 
 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
  
 at java.lang.Thread.run(Thread.java:745) 
 Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
 not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
 at 
 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
  
 at 
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) 
   

[jira] [Updated] (SPARK-8409) In windows cant able to read .csv or .json files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4

2015-06-17 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8409:

Summary:  In windows cant able to read .csv or .json files using read.df() 
in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
 source = csv)  (was:  In windows cant able to read .csv files using 
read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
 source = csv))

  In windows cant able to read .csv or .json files using read.df() in sparkR 
 of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical
  Labels: build

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at 
 

[jira] [Updated] (SPARK-8409) In windows cant able to read .csv or .json files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4

2015-06-17 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8409:

Description: 
Hi, 
In SparkR shell, I invoke: 
 mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
 header=false) 
I have tried various filetypes (csv, txt), all fail.   

RESPONSE: ERROR RBackendHandler: load on 1 failed 
BELOW THE WHOLE RESPONSE: 
15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
curMem=0, maxMem=278302556 
15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 173.4 KB, free 265.2 MB) 
15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
curMem=177600, maxMem=278302556 
15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 16.2 KB, free 265.2 MB) 
15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
NativeMethodAccessorImpl.java:-2 
15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
feature cannot be used because libhadoop cannot be loaded. 
15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
java.lang.reflect.InvocationTargetException 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:606) 
at 
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
 
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
 
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
 
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
 
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 
at java.lang.Thread.run(Thread.java:745) 
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not 
exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
 
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) 
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 

[jira] [Created] (SPARK-8409) i cant able to read .csv files using read.df() in sparkR of spark 1.4 for eg.) mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, header=false)

2015-06-17 Thread Arun (JIRA)
Arun created SPARK-8409:
---

 Summary:  i cant able to read .csv files using read.df() in sparkR 
of spark 1.4 for eg.) mydf-read.df(sqlContext, /home/esten/ami/usaf.json, 
source=json, header=false) 
 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical


Hi, 
In SparkR shell, I invoke: 
 mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
 header=false) 
I have tried various filetypes (csv, txt), all fail.   

RESPONSE: ERROR RBackendHandler: load on 1 failed 
BELOW THE WHOLE RESPONSE: 
15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
curMem=0, maxMem=278302556 
15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 173.4 KB, free 265.2 MB) 
15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
curMem=177600, maxMem=278302556 
15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 16.2 KB, free 265.2 MB) 
15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
NativeMethodAccessorImpl.java:-2 
15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
feature cannot be used because libhadoop cannot be loaded. 
15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
java.lang.reflect.InvocationTargetException 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:606) 
at 
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
 
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
 
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
 
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
 
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 
at java.lang.Thread.run(Thread.java:745) 
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not 
exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
 
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) 
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
at 

[jira] [Updated] (SPARK-8409) In windows cant able to read .csv files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-ha

2015-06-17 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8409:

Summary:  In windows cant able to read .csv files using read.df() in sparkR 
of spark 1.4 for eg.) df_1- read.df(sqlContext, 
E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
 source = csv)  (was:  i cant able to read .csv files using read.df() in 
sparkR of spark 1.4 for eg.) mydf-read.df(sqlContext, 
/home/esten/ami/usaf.json, source=json, header=false) )

  In windows cant able to read .csv files using read.df() in sparkR of spark 
 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical
  Labels: build

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  
 at 
 

[jira] [Updated] (SPARK-8409) In windows cant able to read .csv or .json files using read.df() in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4

2015-06-17 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated SPARK-8409:

Description: 
Hi, 
In SparkR shell, I invoke: 
 mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
 header=false) 
I have tried various filetypes (csv, txt), all fail.   

RESPONSE: ERROR RBackendHandler: load on 1 failed 
BELOW THE WHOLE RESPONSE: 
15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
curMem=0, maxMem=278302556 
15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 173.4 KB, free 265.2 MB) 
15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
curMem=177600, maxMem=278302556 
15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 16.2 KB, free 265.2 MB) 
15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
NativeMethodAccessorImpl.java:-2 
15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
feature cannot be used because libhadoop cannot be loaded. 
15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
java.lang.reflect.InvocationTargetException 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:606) 
at 
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
 
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
 
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
 
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
 
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 
at java.lang.Thread.run(Thread.java:745) 
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not 
exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json 
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
 
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) 
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
at scala.Option.getOrElse(Option.scala:120) 
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 

[jira] [Commented] (SPARK-8409) In windows cant able to read .csv or .json files using read.df()

2015-06-17 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590100#comment-14590100
 ] 

Arun commented on SPARK-8409:
-

Thanks sivram, I have doubts to get clarified.
1.) how can we increase the space of the spark standalone cluster, by default 
its having around 234mb only. Am using spark 1.4 sparkR in windows environment. 
When I browsed it told to edit on spark env.sh , but what code I have to put on.
2.) I have an hoton works standalone cluster (10.200.202.85:8020), can we make 
it as master node in spark content. If yes what's the code in windows env.
3.) Or keeping spark cluster as master can we connect Hortonworks standalone 
cluster as worker node. If possible what's the code in win environment. If am 
wrong kidly regret.

TIA,
Arun Gunalan

  In windows cant able to read .csv or .json files using read.df()
 -

 Key: SPARK-8409
 URL: https://issues.apache.org/jira/browse/SPARK-8409
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
 Environment: sparkR API
Reporter: Arun
Priority: Critical
  Labels: build

 Hi, 
 In SparkR shell, I invoke: 
  mydf-read.df(sqlContext, /home/esten/ami/usaf.json, source=json, 
  header=false) 
 I have tried various filetypes (csv, txt), all fail.   
  in sparkR of spark 1.4 for eg.) df_1- read.df(sqlContext, 
 E:/setup/spark-1.4.0-bin-hadoop2.6/spark-1.4.0-bin-hadoop2.6/examples/src/main/resources/nycflights13.csv,
  source = csv)
 RESPONSE: ERROR RBackendHandler: load on 1 failed 
 BELOW THE WHOLE RESPONSE: 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(177600) called with 
 curMem=0, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.4 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO MemoryStore: ensureFreeSpace(16545) called with 
 curMem=177600, maxMem=278302556 
 15/06/16 08:09:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 16.2 KB, free 265.2 MB) 
 15/06/16 08:09:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:37142 (size: 16.2 KB, free: 265.4 MB) 
 15/06/16 08:09:13 INFO SparkContext: Created broadcast 0 from load at 
 NativeMethodAccessorImpl.java:-2 
 15/06/16 08:09:16 WARN DomainSocketFactory: The short-circuit local reads 
 feature cannot be used because libhadoop cannot be loaded. 
 15/06/16 08:09:17 ERROR RBackendHandler: load on 1 failed 
 java.lang.reflect.InvocationTargetException 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at 
 org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
  
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74) 
 at 
 org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36) 
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  
 at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
  
 at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
  
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
 at