[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-26 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Description: 
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column (or all columns) with a fixed 
value.

2. Support dropping rows with null values (dropna).

3. Support replacing a set of values with another set of values (i.e. map join)



  was:
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column (or all columns) with a fixed 
value.

2. Support replacing a set of values with another set of values.

3. interpolate



> better support for working with missing data
> 
>
> Key: SPARK-6119
> URL: https://issues.apache.org/jira/browse/SPARK-6119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: DataFrame
>
> Real world data can be messy. An important feature of data frames is support 
> for missing data. We should figure out what we want to do here.
> Some ideas:
> 1. Support replacing all null value for a column (or all columns) with a 
> fixed value.
> 2. Support dropping rows with null values (dropna).
> 3. Support replacing a set of values with another set of values (i.e. map 
> join)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-26 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Description: 
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column (or all columns) with a fixed 
value.

2. Support replacing a set of values with another set of values.

3. interpolate


  was:
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column with a fixed value.

2. Support replacing all null value for all columns with a fixed value.



> better support for working with missing data
> 
>
> Key: SPARK-6119
> URL: https://issues.apache.org/jira/browse/SPARK-6119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: DataFrame
>
> Real world data can be messy. An important feature of data frames is support 
> for missing data. We should figure out what we want to do here.
> Some ideas:
> 1. Support replacing all null value for a column (or all columns) with a 
> fixed value.
> 2. Support replacing a set of values with another set of values.
> 3. interpolate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-26 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Description: 
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.

Some ideas:

1. Support replacing all null value for a column with a fixed value.

2. Support replacing all null value for all columns with a fixed value.


  was:
Real world data can be messy. An important feature of data frames is support 
for missing data. We should figure out what we want to do here.




> better support for working with missing data
> 
>
> Key: SPARK-6119
> URL: https://issues.apache.org/jira/browse/SPARK-6119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: DataFrame
>
> Real world data can be messy. An important feature of data frames is support 
> for missing data. We should figure out what we want to do here.
> Some ideas:
> 1. Support replacing all null value for a column with a fixed value.
> 2. Support replacing all null value for all columns with a fixed value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-26 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Labels: DataFrame  (was: )

> better support for working with missing data
> 
>
> Key: SPARK-6119
> URL: https://issues.apache.org/jira/browse/SPARK-6119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: DataFrame
>
> Real world data can be messy. An important feature of data frames is support 
> for missing data. We should figure out what we want to do here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6119) better support for working with missing data

2015-03-26 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6119:
---
Summary: better support for working with missing data  (was: missing data 
support)

> better support for working with missing data
> 
>
> Key: SPARK-6119
> URL: https://issues.apache.org/jira/browse/SPARK-6119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: DataFrame
>
> Real world data can be messy. An important feature of data frames is support 
> for missing data. We should figure out what we want to do here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org