Re: how to merge two dataframes

Ted Yu Fri, 30 Oct 2015 13:01:16 -0700

I see - you were trying to union a non-Cassandra DF with Cassandra DF :-(

On Fri, Oct 30, 2015 at 12:57 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
wrote:


> Not a bad idea I suspect but doesn't help me. I dumbed down the repro to
> ask for help. In reality one of my dataframes is a cassandra DF.
> So cassDF.registerTempTable("df1") registers the temp table in a different
> SQL Context (new CassandraSQLContext(sc)).
>
>
> scala> sql("select customer_id, uri, browser, epoch from df union all
> select customer_id, uri, browser, epoch from df1").show()
> org.apache.spark.sql.AnalysisException: no such table df1; line 1 pos 103
>         at
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>         at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:225)
>         at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:233)
>         at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:229)
>         at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222)
>         at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222)
>         at
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
>         at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:221)
>         at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:242)
>
>
> On Fri, Oct 30, 2015 at 3:34 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> How about the following ?
>>
>> scala> df.registerTempTable("df")
>> scala> df1.registerTempTable("df1")
>> scala> sql("select customer_id, uri, browser, epoch from df union select
>> customer_id, uri, browser, epoch from df1").show()
>> +-----------+-------------+-------+-----+
>> |customer_id|          uri|browser|epoch|
>> +-----------+-------------+-------+-----+
>> |        999|http://foobar|firefox| 1234|
>> |        888|http://foobar|     ie|12343|
>> +-----------+-------------+-------+-----+
>>
>> Cheers
>>
>> On Fri, Oct 30, 2015 at 12:11 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
>> wrote:
>>
>>> Hi folks,
>>>
>>> I have a need to "append" two dataframes -- I was hoping to use UnionAll
>>> but it seems that this operation treats the underlying dataframes as
>>> sequence of columns, rather than a map.
>>>
>>> In particular, my problem is that the columns in the two DFs are not in
>>> the same order --notice that my customer_id somehow comes out a string:
>>>
>>> This is Spark 1.4.1
>>>
>>> case class Test(epoch: Long,browser:String,customer_id:Int,uri:String)
>>> val test = Test(1234l,"firefox",999,"http://foobar";)
>>>
>>> case class Test1( customer_id :Int,    uri:String,    browser:String,   
>>> epoch :Long)
>>> val test1 = Test1(888,"http://foobar","ie",12343)
>>> val df=sc.parallelize(Seq(test)).toDF
>>> val df1=sc.parallelize(Seq(test1)).toDF
>>> df.unionAll(df1)
>>>
>>> //res2: org.apache.spark.sql.DataFrame = [epoch: bigint, browser: string, 
>>> customer_id: string, uri: string]
>>>
>>> 
>>>
>>> Is unionAll the wrong operation? Any special incantations? Or advice on
>>> how to otherwise get this to succeeed?
>>>
>>
>>
>

Re: how to merge two dataframes

Reply via email to