I checked with Spark 1.6.1 it still works fine.
I also check out latest source code in Spark 2.0 branch and built and get the 
same issue.

I think because of changing API to dataset in Spark 2.0?



Regards,
Chanh


> On Aug 5, 2016, at 9:44 AM, Chanh Le <giaosu...@gmail.com> wrote:
> 
> Hi Nicholas,
> Thanks for the information. 
> How did you solve the issue? 
> Did you change the parquet file by renaming the column name? 
> I used to change the column name when I create a table in Hive without 
> changing the parquet file but it’s still showing NULL.
> The parquet files of mine quite big so anything I can do without rewriting 
> the parquet will be better.
> 
> 
> Regards,
> Chanh.
> 
> 
>> On Aug 5, 2016, at 2:24 AM, Nicholas Hakobian 
>> <nicholas.hakob...@rallyhealth.com 
>> <mailto:nicholas.hakob...@rallyhealth.com>> wrote:
>> 
>> Its due to the casing of the 'I' in userId. Your schema (from printSchema) 
>> names the field "userId", while your external table definition has it as 
>> "userid".
>> 
>> We've run into similar issues with external Parquet tables defined in Hive 
>> defined with lowercase only and accessing through HiveContext. You should 
>> check out this documentation as it describes how Spark handles column 
>> definitions:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion
>>  
>> <http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion>
>> 
>> 
>> Nicholas Szandor Hakobian, Ph.D.
>> Data Scientist
>> Rally Health
>> nicholas.hakob...@rallyhealth.com <mailto:nicholas.hakob...@rallyhealth.com>
>> 
>> 
>> On Thu, Aug 4, 2016 at 4:53 AM, Chanh Le <giaosu...@gmail.com 
>> <mailto:giaosu...@gmail.com>> wrote:
>> Hi Takeshi, 
>> I already have changed the colum type into INT and String but it got the 
>> same Null values. 
>> it only happens in userid that why it so annoying.
>> 
>> thanks and regards, 
>> Chanh
>> 
>> 
>> On Aug 4, 2016 5:59 PM, "Takeshi Yamamuro" <linguin....@gmail.com 
>> <mailto:linguin....@gmail.com>> wrote:
>> Hi,
>> 
>> When changing the long type into int one, does the issue also happen?
>> And also, could you show more simple query to reproduce the issue?
>> 
>> // maropu
>> 
>> On Thu, Aug 4, 2016 at 7:35 PM, Chanh Le <giaosu...@gmail.com 
>> <mailto:giaosu...@gmail.com>> wrote:
>> 
>> Hi everyone,
>> 
>> I have a parquet file and it has data but when I use Spark Thrift Server to 
>> query it shows NULL for userid.
>> As you can see I can get data by Spark Scala but STS is not.
>> 
>> <Screen Shot 2016-08-04 at 5.32.04 PM.png>
>> 
>> The file schema
>> root
>>  |-- time: string (nullable = true)
>>  |-- topic_id: integer (nullable = true)
>>  |-- interest_id: integer (nullable = true)
>>  |-- inmarket_id: integer (nullable = true)
>>  |-- os_id: integer (nullable = true)
>>  |-- browser_id: integer (nullable = true)
>>  |-- device_type: integer (nullable = true)
>>  |-- device_id: integer (nullable = true)
>>  |-- location_id: integer (nullable = true)
>>  |-- age_id: integer (nullable = true)
>>  |-- gender_id: integer (nullable = true)
>>  |-- website_id: integer (nullable = true)
>>  |-- channel_id: integer (nullable = true)
>>  |-- section_id: integer (nullable = true)
>>  |-- zone_id: integer (nullable = true)
>>  |-- placement_id: integer (nullable = true)
>>  |-- advertiser_id: integer (nullable = true)
>>  |-- campaign_id: integer (nullable = true)
>>  |-- payment_id: integer (nullable = true)
>>  |-- creative_id: integer (nullable = true)
>>  |-- audience_id: integer (nullable = true)
>>  |-- merchant_cate: integer (nullable = true)
>>  |-- ad_default: integer (nullable = true)
>>  |-- userId: long (nullable = true)
>>  |-- impression: integer (nullable = true)
>>  |-- viewable: integer (nullable = true)
>>  |-- click: integer (nullable = true)
>>  |-- click_fraud: integer (nullable = true)
>>  |-- revenue: double (nullable = true)
>>  |-- proceeds: double (nullable = true)
>>  |-- spent: double (nullable = true)
>>  |-- network_id: integer (nullable = true)
>> 
>> 
>> I create a table in Spark Thrift Server by.
>> 
>> CREATE EXTERNAL TABLE ad_cookie_report (time String, advertiser_id int, 
>> campaign_id int, payment_id int, creative_id int, website_id int, channel_id 
>> int, section_id int, zone_id int, ad_default int, placment_id int, topic_id 
>> int, interest_id int, inmarket_id int, audience_id int, os_id int, 
>> browser_id int, device_type int, device_id int, location_id int, age_id int, 
>> gender_id int, merchant_cate int, userid bigint, impression int, viewable 
>> int, click int, click_fraud int, revenue double, proceeds double, spent 
>> double, network_id integer)
>> STORED AS PARQUET LOCATION 'alluxio://master2:19998/AD_COOKIE_REPORT' <>;
>> 
>> But when I query it got all in  NULL values.
>> 
>> 0: jdbc:hive2://master1:10000> select userid from ad_cookie_report limit 10;
>> +---------+--+
>> | userid  |
>> +---------+--+
>> | NULL    |
>> | NULL    |
>> | NULL    |
>> | NULL    |
>> | NULL    |
>> | NULL    |
>> | NULL    |
>> | NULL    |
>> | NULL    |
>> | NULL    |
>> +---------+--+
>> 10 rows selected (3.507 seconds)
>> 
>> How to solve the problem? Is that related to field with Uppercase?
>> How to change the field name in this situation.
>> 
>> 
>> Regards,
>> Chanh
>> 
>> 
>> 
>> 
>> -- 
>> ---
>> Takeshi Yamamuro
>> 
>> 
> 

Reply via email to