Hi Nicholas, Thanks for the information. How did you solve the issue? Did you change the parquet file by renaming the column name? I used to change the column name when I create a table in Hive without changing the parquet file but it’s still showing NULL. The parquet files of mine quite big so anything I can do without rewriting the parquet will be better.
Regards, Chanh. > On Aug 5, 2016, at 2:24 AM, Nicholas Hakobian > <nicholas.hakob...@rallyhealth.com> wrote: > > Its due to the casing of the 'I' in userId. Your schema (from printSchema) > names the field "userId", while your external table definition has it as > "userid". > > We've run into similar issues with external Parquet tables defined in Hive > defined with lowercase only and accessing through HiveContext. You should > check out this documentation as it describes how Spark handles column > definitions: > http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion > > <http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion> > > > Nicholas Szandor Hakobian, Ph.D. > Data Scientist > Rally Health > nicholas.hakob...@rallyhealth.com <mailto:nicholas.hakob...@rallyhealth.com> > > > On Thu, Aug 4, 2016 at 4:53 AM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi Takeshi, > I already have changed the colum type into INT and String but it got the same > Null values. > it only happens in userid that why it so annoying. > > thanks and regards, > Chanh > > > On Aug 4, 2016 5:59 PM, "Takeshi Yamamuro" <linguin....@gmail.com > <mailto:linguin....@gmail.com>> wrote: > Hi, > > When changing the long type into int one, does the issue also happen? > And also, could you show more simple query to reproduce the issue? > > // maropu > > On Thu, Aug 4, 2016 at 7:35 PM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > > Hi everyone, > > I have a parquet file and it has data but when I use Spark Thrift Server to > query it shows NULL for userid. > As you can see I can get data by Spark Scala but STS is not. > > <Screen Shot 2016-08-04 at 5.32.04 PM.png> > > The file schema > root > |-- time: string (nullable = true) > |-- topic_id: integer (nullable = true) > |-- interest_id: integer (nullable = true) > |-- inmarket_id: integer (nullable = true) > |-- os_id: integer (nullable = true) > |-- browser_id: integer (nullable = true) > |-- device_type: integer (nullable = true) > |-- device_id: integer (nullable = true) > |-- location_id: integer (nullable = true) > |-- age_id: integer (nullable = true) > |-- gender_id: integer (nullable = true) > |-- website_id: integer (nullable = true) > |-- channel_id: integer (nullable = true) > |-- section_id: integer (nullable = true) > |-- zone_id: integer (nullable = true) > |-- placement_id: integer (nullable = true) > |-- advertiser_id: integer (nullable = true) > |-- campaign_id: integer (nullable = true) > |-- payment_id: integer (nullable = true) > |-- creative_id: integer (nullable = true) > |-- audience_id: integer (nullable = true) > |-- merchant_cate: integer (nullable = true) > |-- ad_default: integer (nullable = true) > |-- userId: long (nullable = true) > |-- impression: integer (nullable = true) > |-- viewable: integer (nullable = true) > |-- click: integer (nullable = true) > |-- click_fraud: integer (nullable = true) > |-- revenue: double (nullable = true) > |-- proceeds: double (nullable = true) > |-- spent: double (nullable = true) > |-- network_id: integer (nullable = true) > > > I create a table in Spark Thrift Server by. > > CREATE EXTERNAL TABLE ad_cookie_report (time String, advertiser_id int, > campaign_id int, payment_id int, creative_id int, website_id int, channel_id > int, section_id int, zone_id int, ad_default int, placment_id int, topic_id > int, interest_id int, inmarket_id int, audience_id int, os_id int, browser_id > int, device_type int, device_id int, location_id int, age_id int, gender_id > int, merchant_cate int, userid bigint, impression int, viewable int, click > int, click_fraud int, revenue double, proceeds double, spent double, > network_id integer) > STORED AS PARQUET LOCATION 'alluxio://master2:19998/AD_COOKIE_REPORT' <>; > > But when I query it got all in NULL values. > > 0: jdbc:hive2://master1:10000> select userid from ad_cookie_report limit 10; > +---------+--+ > | userid | > +---------+--+ > | NULL | > | NULL | > | NULL | > | NULL | > | NULL | > | NULL | > | NULL | > | NULL | > | NULL | > | NULL | > +---------+--+ > 10 rows selected (3.507 seconds) > > How to solve the problem? Is that related to field with Uppercase? > How to change the field name in this situation. > > > Regards, > Chanh > > > > > -- > --- > Takeshi Yamamuro > >