This is strange. cc the dev list since it might be a bug.


On Thu, Apr 16, 2015 at 3:18 PM, Cesar Flores <ces...@gmail.com> wrote:

> Never mind. I found the solution:
>
> val newDataFrame = hc.createDataFrame(hiveLoadedDataFrame.rdd,
> hiveLoadedDataFrame.schema)
>
> which translate to convert the data frame to rdd and back again to data
> frame. Not the prettiest solution, but at least it solves my problems.
>
>
> Thanks,
> Cesar Flores
>
>
>
> On Thu, Apr 16, 2015 at 11:17 AM, Cesar Flores <ces...@gmail.com> wrote:
>
>>
>> I have a data frame in which I load data from a hive table. And my issue
>> is that the data frame is missing the columns that I need to query.
>>
>> For example:
>>
>> val newdataset = dataset.where(dataset("label") === 1)
>>
>> gives me an error like the following:
>>
>> ERROR yarn.ApplicationMaster: User class threw exception: resolved
>> attributes label missing from label, user_id, ...(the rest of the fields of
>> my table
>> org.apache.spark.sql.AnalysisException: resolved attributes label missing
>> from label, user_id, ... (the rest of the fields of my table)
>>
>> where we can see that the label field actually exist. I manage to solve
>> this issue by updating my syntax to:
>>
>> val newdataset = dataset.where($"label" === 1)
>>
>> which works. However I can not make this trick in all my queries. For
>> example, when I try to do a unionAll from two subsets of the same data
>> frame the error I am getting is that all my fields are missing.
>>
>> Can someone tell me if I need to do some post processing after loading
>> from hive in order to avoid this kind of errors?
>>
>>
>> Thanks
>> --
>> Cesar Flores
>>
>
>
>
> --
> Cesar Flores
>

Reply via email to