RE: Help! Stuck using withColumn

2015-08-27 Thread Saif.A.Ellafi
Hello, thank you for the response.

I found a blog where a guy explains that it is not possible to join columns 
from different data frames.

I was trying to modify one column’s information, so selecting it and then 
trying to replace the original dataframe column. Found another way,

Thanks
Saif

From: Silvio Fiorito [mailto:silvio.fior...@granturing.com]
Sent: Wednesday, August 26, 2015 8:54 PM
To: Ellafi, Saif A.; user@spark.apache.org
Subject: Re: Help! Stuck using withColumn

Hi Saif,

In both cases you’re referencing columns that don’t exist in the current 
DataFrame.

The first email you did a select and then a withColumn for ‘month_date_cur' on 
the resulting DF, but that column does not exist, because you did a select for 
only ‘month_balance’.

In the second email you’re using 2 different DFs and trying to select a column 
from one in a withColumn on the other, that just wouldn’t work. Also, there’s 
no explicit column names given to either DF, so that column doesn’t exist.

Did you intend to do a join instead?

Thanks,
Silvio

From: saif.a.ell...@wellsfargo.commailto:saif.a.ell...@wellsfargo.com
Date: Wednesday, August 26, 2015 at 6:06 PM
To: saif.a.ell...@wellsfargo.commailto:saif.a.ell...@wellsfargo.com, 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: RE: Help! Stuck using withColumn

I can reproduce this even simpler with the following:

val gf = sc.parallelize(Array(3,6,4,7,3,4,5,5,31,4,5,2)).toDF(ASD)
val ff = sc.parallelize(Array(4,6,2,3,5,1,4,6,23,6,4,7)).toDF(GFD)

gf.withColumn(DSA, ff.col(GFD))

org.apache.spark.sql.AnalysisException: resolved attribute(s) GFD#421 missing 
from ASD#419 in operator !Project [ASD#419,GFD#421 AS DSA#422];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931)
at org.apache.spark.sql.DataFrame.init(DataFrame.scala:131)
at 
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595)
at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1039)


From: saif.a.ell...@wellsfargo.commailto:saif.a.ell...@wellsfargo.com 
[mailto:saif.a.ell...@wellsfargo.com]
Sent: Wednesday, August 26, 2015 6:47 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Help! Stuck using withColumn

This simple comand call:

val final_df = data.select(month_balance).withColumn(month_date, 
data.col(month_date_curr))

Is throwing:

org.apache.spark.sql.AnalysisException: resolved attribute(s) 
month_date_curr#324 missing from month_balance#234 in operator !Project 
[month_balance#234, month_date_curr#324 AS month_date_curr#408];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931)
at org.apache.spark.sql.DataFrame.init(DataFrame.scala:131)
at 
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595)
at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1039)





Help! Stuck using withColumn

2015-08-26 Thread Saif.A.Ellafi
This simple comand call:

val final_df = data.select(month_balance).withColumn(month_date, 
data.col(month_date_curr))

Is throwing:

org.apache.spark.sql.AnalysisException: resolved attribute(s) 
month_date_curr#324 missing from month_balance#234 in operator !Project 
[month_balance#234, month_date_curr#324 AS month_date_curr#408];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931)
at org.apache.spark.sql.DataFrame.init(DataFrame.scala:131)
at 
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595)
at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1039)





RE: Help! Stuck using withColumn

2015-08-26 Thread Saif.A.Ellafi
I can reproduce this even simpler with the following:

val gf = sc.parallelize(Array(3,6,4,7,3,4,5,5,31,4,5,2)).toDF(ASD)
val ff = sc.parallelize(Array(4,6,2,3,5,1,4,6,23,6,4,7)).toDF(GFD)

gf.withColumn(DSA, ff.col(GFD))

org.apache.spark.sql.AnalysisException: resolved attribute(s) GFD#421 missing 
from ASD#419 in operator !Project [ASD#419,GFD#421 AS DSA#422];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931)
at org.apache.spark.sql.DataFrame.init(DataFrame.scala:131)
at 
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595)
at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1039)


From: saif.a.ell...@wellsfargo.com [mailto:saif.a.ell...@wellsfargo.com]
Sent: Wednesday, August 26, 2015 6:47 PM
To: user@spark.apache.org
Subject: Help! Stuck using withColumn

This simple comand call:

val final_df = data.select(month_balance).withColumn(month_date, 
data.col(month_date_curr))

Is throwing:

org.apache.spark.sql.AnalysisException: resolved attribute(s) 
month_date_curr#324 missing from month_balance#234 in operator !Project 
[month_balance#234, month_date_curr#324 AS month_date_curr#408];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931)
at org.apache.spark.sql.DataFrame.init(DataFrame.scala:131)
at 
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595)
at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1039)





Re: Help! Stuck using withColumn

2015-08-26 Thread Silvio Fiorito
Hi Saif,

In both cases you’re referencing columns that don’t exist in the current 
DataFrame.

The first email you did a select and then a withColumn for ‘month_date_cur' on 
the resulting DF, but that column does not exist, because you did a select for 
only ‘month_balance’.

In the second email you’re using 2 different DFs and trying to select a column 
from one in a withColumn on the other, that just wouldn’t work. Also, there’s 
no explicit column names given to either DF, so that column doesn’t exist.

Did you intend to do a join instead?

Thanks,
Silvio

From: saif.a.ell...@wellsfargo.commailto:saif.a.ell...@wellsfargo.com
Date: Wednesday, August 26, 2015 at 6:06 PM
To: saif.a.ell...@wellsfargo.commailto:saif.a.ell...@wellsfargo.com, 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: RE: Help! Stuck using withColumn

I can reproduce this even simpler with the following:

val gf = sc.parallelize(Array(3,6,4,7,3,4,5,5,31,4,5,2)).toDF(ASD)
val ff = sc.parallelize(Array(4,6,2,3,5,1,4,6,23,6,4,7)).toDF(GFD)

gf.withColumn(DSA, ff.col(GFD))

org.apache.spark.sql.AnalysisException: resolved attribute(s) GFD#421 missing 
from ASD#419 in operator !Project [ASD#419,GFD#421 AS DSA#422];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931)
at org.apache.spark.sql.DataFrame.init(DataFrame.scala:131)
at 
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595)
at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1039)


From: saif.a.ell...@wellsfargo.commailto:saif.a.ell...@wellsfargo.com 
[mailto:saif.a.ell...@wellsfargo.com]
Sent: Wednesday, August 26, 2015 6:47 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Help! Stuck using withColumn

This simple comand call:

val final_df = data.select(month_balance).withColumn(month_date, 
data.col(month_date_curr))

Is throwing:

org.apache.spark.sql.AnalysisException: resolved attribute(s) 
month_date_curr#324 missing from month_balance#234 in operator !Project 
[month_balance#234, month_date_curr#324 AS month_date_curr#408];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931)
at org.apache.spark.sql.DataFrame.init(DataFrame.scala:131)
at 
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595)
at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1039)