[jira] [Created] (SPARK-3202) Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD
Hingorani, Vineet created SPARK-3202: Summary: Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD Key: SPARK-3202 URL: https://issues.apache.org/jira/browse/SPARK-3202 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Hingorani, Vineet Hello all, Could someone help me with the manipulation of csv file data. I have 'semicolon' separated csv data including doubles and strings. I want to calculate the maximum/average of a column. When I read the file using sc.textFile(test.csv).map(_.split(;), each field is read as string. Could someone help me with the above manipulation and how to do that. Or may be if there is some way to take the transpose of the data and then manipulating the rows in some way? Thank you in advance, I am struggling with this thing for quite sometime Regards, Vineet -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3202) Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD
[ https://issues.apache.org/jira/browse/SPARK-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hingorani, Vineet closed SPARK-3202. Resolution: Invalid Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD - Key: SPARK-3202 URL: https://issues.apache.org/jira/browse/SPARK-3202 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Hingorani, Vineet Hello all, Could someone help me with the manipulation of csv file data. I have 'semicolon' separated csv data including doubles and strings. I want to calculate the maximum/average of a column. When I read the file using sc.textFile(test.csv).map(_.split(;), each field is read as string. Could someone help me with the above manipulation and how to do that. Or may be if there is some way to take the transpose of the data and then manipulating the rows in some way? Thank you in advance, I am struggling with this thing for quite sometime Regards, Vineet -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3202) Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD
[ https://issues.apache.org/jira/browse/SPARK-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109129#comment-14109129 ] Hingorani, Vineet commented on SPARK-3202: -- Thank you Sean for the helping regarding the platform. :) Manipulating columns in CSV file or Transpose of Array[Array[String]] RDD - Key: SPARK-3202 URL: https://issues.apache.org/jira/browse/SPARK-3202 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Hingorani, Vineet Hello all, Could someone help me with the manipulation of csv file data. I have 'semicolon' separated csv data including doubles and strings. I want to calculate the maximum/average of a column. When I read the file using sc.textFile(test.csv).map(_.split(;), each field is read as string. Could someone help me with the above manipulation and how to do that. Or may be if there is some way to take the transpose of the data and then manipulating the rows in some way? Thank you in advance, I am struggling with this thing for quite sometime Regards, Vineet -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2360) CSV import to SchemaRDDs
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106851#comment-14106851 ] Hingorani, Vineet commented on SPARK-2360: -- Hello Michael, I saw your comment thread on a mail archive regarding having to be able to manipulate csv files using spark. Could you please give some information as to do have this functionality now in the latest release of Spark? I have installed the lates version as of now and running it on my local machine. Thank you Regards, Vineet Hingorani Developer Associate Custom Development Strategic Projects group (CDSP) Products Innovation (PI) SAP SE WDF 03, C3.03 E vineet.hingor...@sap.commailto:vineet.hingor...@sap.com CSV import to SchemaRDDs Key: SPARK-2360 URL: https://issues.apache.org/jira/browse/SPARK-2360 Project: Spark Issue Type: New Feature Components: SQL Reporter: Michael Armbrust Assignee: Hossein Falaki I think the first step it to design the interface that we want to present to users. Mostly this is defining options when importing. Off the top of my head: - What is the separator? - Provide column names or infer them from the first row. - how to handle multiple files with possibly different schemas - do we have a method to let users specify the datatypes of the columns or are they just strings? - what types of quoting / escaping do we want to support? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org