[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-27 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Description: 
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column (or all columns) with a fixed 
value.

2. Support replacing a set of values with another set of values.

3. interpolate


  was:
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column with a fixed value.

2. Support replacing all null value for all columns with a fixed value.



 better support for working with missing data
 

 Key: SPARK-6119
 URL: https://issues.apache.org/jira/browse/SPARK-6119
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
  Labels: DataFrame

 Real world data can be messy. An important feature of data frames is support 
 for missing data. We should figure out what we want to do here.
 Some ideas:
 1. Support replacing all null value for a column (or all columns) with a 
 fixed value.
 2. Support replacing a set of values with another set of values.
 3. interpolate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-27 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Labels: DataFrame  (was: )

 better support for working with missing data
 

 Key: SPARK-6119
 URL: https://issues.apache.org/jira/browse/SPARK-6119
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
  Labels: DataFrame

 Real world data can be messy. An important feature of data frames is support 
 for missing data. We should figure out what we want to do here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-27 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Summary: better support for working with missing data  (was: missing data 
support)

 better support for working with missing data
 

 Key: SPARK-6119
 URL: https://issues.apache.org/jira/browse/SPARK-6119
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
  Labels: DataFrame

 Real world data can be messy. An important feature of data frames is support 
 for missing data. We should figure out what we want to do here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-27 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Description: 
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column with a fixed value.

2. Support replacing all null value for all columns with a fixed value.


  was:
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.




 better support for working with missing data
 

 Key: SPARK-6119
 URL: https://issues.apache.org/jira/browse/SPARK-6119
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
  Labels: DataFrame

 Real world data can be messy. An important feature of data frames is support 
 for missing data. We should figure out what we want to do here.
 Some ideas:
 1. Support replacing all null value for a column with a fixed value.
 2. Support replacing all null value for all columns with a fixed value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-27 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Description: 
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column (or all columns) with a fixed 
value.

2. Support dropping rows with null values (dropna).

3. Support replacing a set of values with another set of values (i.e. map join)



  was:
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column (or all columns) with a fixed 
value.

2. Support replacing a set of values with another set of values.

3. interpolate



 better support for working with missing data
 

 Key: SPARK-6119
 URL: https://issues.apache.org/jira/browse/SPARK-6119
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
  Labels: DataFrame

 Real world data can be messy. An important feature of data frames is support 
 for missing data. We should figure out what we want to do here.
 Some ideas:
 1. Support replacing all null value for a column (or all columns) with a 
 fixed value.
 2. Support dropping rows with null values (dropna).
 3. Support replacing a set of values with another set of values (i.e. map 
 join)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org