[jira] [Created] (SPARK-3202) Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD

2014-08-25 Thread Hingorani, Vineet (JIRA)
Hingorani, Vineet created SPARK-3202:


 Summary: Manipulating columns in CSV file or Transpose of 
Array[Array[String]] RDD
 Key: SPARK-3202
 URL: https://issues.apache.org/jira/browse/SPARK-3202
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Hingorani, Vineet


Hello all,

Could someone help me with the manipulation of csv file data. I have 
'semicolon' separated csv data including doubles and strings. I want to 
calculate the maximum/average of a column. When I read the file using 
sc.textFile(test.csv).map(_.split(;), each field is read as string. Could 
someone help me with the above manipulation and how to do that.

Or may be if there is some way to take the transpose of the data and then 
manipulating the rows in some way?

Thank you in advance, I am struggling with this thing for quite sometime

Regards,
Vineet



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3202) Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD

2014-08-25 Thread Hingorani, Vineet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hingorani, Vineet closed SPARK-3202.


Resolution: Invalid

 Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD
 -

 Key: SPARK-3202
 URL: https://issues.apache.org/jira/browse/SPARK-3202
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Hingorani, Vineet

 Hello all,
 Could someone help me with the manipulation of csv file data. I have 
 'semicolon' separated csv data including doubles and strings. I want to 
 calculate the maximum/average of a column. When I read the file using 
 sc.textFile(test.csv).map(_.split(;), each field is read as string. Could 
 someone help me with the above manipulation and how to do that.
 Or may be if there is some way to take the transpose of the data and then 
 manipulating the rows in some way?
 Thank you in advance, I am struggling with this thing for quite sometime
 Regards,
 Vineet



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3202) Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD

2014-08-25 Thread Hingorani, Vineet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109129#comment-14109129
 ] 

Hingorani, Vineet commented on SPARK-3202:
--

Thank you Sean for the helping regarding the platform. :)

 Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD
 -

 Key: SPARK-3202
 URL: https://issues.apache.org/jira/browse/SPARK-3202
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Hingorani, Vineet

 Hello all,
 Could someone help me with the manipulation of csv file data. I have 
 'semicolon' separated csv data including doubles and strings. I want to 
 calculate the maximum/average of a column. When I read the file using 
 sc.textFile(test.csv).map(_.split(;), each field is read as string. Could 
 someone help me with the above manipulation and how to do that.
 Or may be if there is some way to take the transpose of the data and then 
 manipulating the rows in some way?
 Thank you in advance, I am struggling with this thing for quite sometime
 Regards,
 Vineet



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2360) CSV import to SchemaRDDs

2014-08-22 Thread Hingorani, Vineet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106851#comment-14106851
 ] 

Hingorani, Vineet commented on SPARK-2360:
--

Hello Michael,

I saw your comment thread on a mail archive regarding having to be able to 
manipulate csv files using spark. Could you please give some information as to 
do have this functionality now in the latest release of Spark? I have installed 
the lates version as of now and running it on my local machine.

Thank you

Regards,

Vineet Hingorani
Developer Associate
Custom Development  Strategic Projects group (CDSP)
Products  Innovation (PI)
SAP SE
WDF 03, C3.03
E vineet.hingor...@sap.commailto:vineet.hingor...@sap.com



 CSV import to SchemaRDDs
 

 Key: SPARK-2360
 URL: https://issues.apache.org/jira/browse/SPARK-2360
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Michael Armbrust
Assignee: Hossein Falaki

 I think the first step it to design the interface that we want to present to 
 users.  Mostly this is defining options when importing.  Off the top of my 
 head:
 - What is the separator?
 - Provide column names or infer them from the first row.
 - how to handle multiple files with possibly different schemas
 - do we have a method to let users specify the datatypes of the columns or 
 are they just strings?
 - what types of quoting / escaping do we want to support?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org