[jira] [Comment Edited] (SPARK-8007) Support resolving virtual columns in DataFrames

2016-12-03 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15718615#comment-15718615
 ] 

Ruslan Dautkhanov edited comment on SPARK-8007 at 12/3/16 7:34 PM:
---

Is {noformat}spark__partition__id{noformat} available in PySpark too? Can't 
find a way to run the same code in PySpark.


was (Author: tagar):
Is spark__partition__id available in PySpark too? Can't find a way to run the 
same code in PySpark.

> Support resolving virtual columns in DataFrames
> ---
>
> Key: SPARK-8007
> URL: https://issues.apache.org/jira/browse/SPARK-8007
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Joseph Batchik
>
> Create the infrastructure so we can resolve df("SPARK__PARTITION__ID") to 
> SparkPartitionID expression.
> A cool use case is to understand physical data skew:
> {code}
> df.groupBy("SPARK__PARTITION__ID").count()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8007) Support resolving virtual columns in DataFrames

2015-07-17 Thread Joseph Batchik (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630820#comment-14630820
 ] 

Joseph Batchik edited comment on SPARK-8007 at 7/17/15 6:01 AM:


Reynold, I start adding virtual columns to the DataFrames and SQL queries for 
SPARK-8003 and SPARK-8007. My initial code is here: 
https://github.com/JDrit/spark/commit/e34d3a7eabbc9c41c2dd85b128b2bb5713039e40.

The one issue I ran into though was that the catalyst package cannot access 
org.apache.spark.sql.execution.expressions where SparkPartitionID resides. For 
prototyping purposes I copied SparkPartitionID to the catalyst package, but am 
wondering what would be the best way to deal with that dependency,  

Can you let me know what you think about my changes and what else needs to be 
done to it.


was (Author: jd):
[~rxin] Reynold, I start adding virtual columns to the DataFrames and SQL 
queries for SPARK-8003 and SPARK-8007. My initial code is here: 
https://github.com/JDrit/spark/commit/e34d3a7eabbc9c41c2dd85b128b2bb5713039e40.

The one issue I ran into though was that the catalyst package cannot access 
org.apache.spark.sql.execution.expressions where SparkPartitionID resides. For 
prototyping purposes I copied SparkPartitionID to the catalyst package, but am 
wondering what would be the best way to deal with that dependency,  

Can you let me know what you think about my changes and what else needs to be 
done to it.

 Support resolving virtual columns in DataFrames
 ---

 Key: SPARK-8007
 URL: https://issues.apache.org/jira/browse/SPARK-8007
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 Create the infrastructure so we can resolve df(SPARK__PARTITION__ID) to 
 SparkPartitionID expression.
 A cool use case is to understand physical data skew:
 {code}
 df.groupBy(SPARK__PARTITION__ID).count()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org