Hi Nicholas,
Thanks for the information. 
How did you solve the issue? 
Did you change the parquet file by renaming the column name? 
I used to change the column name when I create a table in Hive without changing 
the parquet file but it’s still showing NULL.
The parquet files of mine quite big so anything I can do without rewriting the 
parquet will be better.


Regards,
Chanh.


> On Aug 5, 2016, at 2:24 AM, Nicholas Hakobian 
> <nicholas.hakob...@rallyhealth.com> wrote:
> 
> Its due to the casing of the 'I' in userId. Your schema (from printSchema) 
> names the field "userId", while your external table definition has it as 
> "userid".
> 
> We've run into similar issues with external Parquet tables defined in Hive 
> defined with lowercase only and accessing through HiveContext. You should 
> check out this documentation as it describes how Spark handles column 
> definitions:
> http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion
>  
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion>
> 
> 
> Nicholas Szandor Hakobian, Ph.D.
> Data Scientist
> Rally Health
> nicholas.hakob...@rallyhealth.com <mailto:nicholas.hakob...@rallyhealth.com>
> 
> 
> On Thu, Aug 4, 2016 at 4:53 AM, Chanh Le <giaosu...@gmail.com 
> <mailto:giaosu...@gmail.com>> wrote:
> Hi Takeshi, 
> I already have changed the colum type into INT and String but it got the same 
> Null values. 
> it only happens in userid that why it so annoying.
> 
> thanks and regards, 
> Chanh
> 
> 
> On Aug 4, 2016 5:59 PM, "Takeshi Yamamuro" <linguin....@gmail.com 
> <mailto:linguin....@gmail.com>> wrote:
> Hi,
> 
> When changing the long type into int one, does the issue also happen?
> And also, could you show more simple query to reproduce the issue?
> 
> // maropu
> 
> On Thu, Aug 4, 2016 at 7:35 PM, Chanh Le <giaosu...@gmail.com 
> <mailto:giaosu...@gmail.com>> wrote:
> 
> Hi everyone,
> 
> I have a parquet file and it has data but when I use Spark Thrift Server to 
> query it shows NULL for userid.
> As you can see I can get data by Spark Scala but STS is not.
> 
> <Screen Shot 2016-08-04 at 5.32.04 PM.png>
> 
> The file schema
> root
>  |-- time: string (nullable = true)
>  |-- topic_id: integer (nullable = true)
>  |-- interest_id: integer (nullable = true)
>  |-- inmarket_id: integer (nullable = true)
>  |-- os_id: integer (nullable = true)
>  |-- browser_id: integer (nullable = true)
>  |-- device_type: integer (nullable = true)
>  |-- device_id: integer (nullable = true)
>  |-- location_id: integer (nullable = true)
>  |-- age_id: integer (nullable = true)
>  |-- gender_id: integer (nullable = true)
>  |-- website_id: integer (nullable = true)
>  |-- channel_id: integer (nullable = true)
>  |-- section_id: integer (nullable = true)
>  |-- zone_id: integer (nullable = true)
>  |-- placement_id: integer (nullable = true)
>  |-- advertiser_id: integer (nullable = true)
>  |-- campaign_id: integer (nullable = true)
>  |-- payment_id: integer (nullable = true)
>  |-- creative_id: integer (nullable = true)
>  |-- audience_id: integer (nullable = true)
>  |-- merchant_cate: integer (nullable = true)
>  |-- ad_default: integer (nullable = true)
>  |-- userId: long (nullable = true)
>  |-- impression: integer (nullable = true)
>  |-- viewable: integer (nullable = true)
>  |-- click: integer (nullable = true)
>  |-- click_fraud: integer (nullable = true)
>  |-- revenue: double (nullable = true)
>  |-- proceeds: double (nullable = true)
>  |-- spent: double (nullable = true)
>  |-- network_id: integer (nullable = true)
> 
> 
> I create a table in Spark Thrift Server by.
> 
> CREATE EXTERNAL TABLE ad_cookie_report (time String, advertiser_id int, 
> campaign_id int, payment_id int, creative_id int, website_id int, channel_id 
> int, section_id int, zone_id int, ad_default int, placment_id int, topic_id 
> int, interest_id int, inmarket_id int, audience_id int, os_id int, browser_id 
> int, device_type int, device_id int, location_id int, age_id int, gender_id 
> int, merchant_cate int, userid bigint, impression int, viewable int, click 
> int, click_fraud int, revenue double, proceeds double, spent double, 
> network_id integer)
> STORED AS PARQUET LOCATION 'alluxio://master2:19998/AD_COOKIE_REPORT' <>;
> 
> But when I query it got all in  NULL values.
> 
> 0: jdbc:hive2://master1:10000> select userid from ad_cookie_report limit 10;
> +---------+--+
> | userid  |
> +---------+--+
> | NULL    |
> | NULL    |
> | NULL    |
> | NULL    |
> | NULL    |
> | NULL    |
> | NULL    |
> | NULL    |
> | NULL    |
> | NULL    |
> +---------+--+
> 10 rows selected (3.507 seconds)
> 
> How to solve the problem? Is that related to field with Uppercase?
> How to change the field name in this situation.
> 
> 
> Regards,
> Chanh
> 
> 
> 
> 
> -- 
> ---
> Takeshi Yamamuro
> 
> 

Reply via email to