Re: Using Hive table for twitter data

2016-06-09 Thread Gopal Vijayaraghavan

> Any reason why that table in Hive cannot read data in?

No idea how you're loading data with flume, but it isn't doing it right.

>> PARTITIONED BY (datehour INT)

...

>> -rw-r--r--   2 hduser supergroup 433868 2016-06-09 09:52
>>/twitter_data/FlumeData.1465462333430

No ideas on how to get that to create partitions either.

Cheers,
Gopal




Re: Using Hive table for twitter data

2016-06-09 Thread Mich Talebzadeh
thanks Gopal

that link

404 - OOPS!
Looks like you wandered too far from the herd!

LOL

Any reason why that table in Hive cannot read data in?

cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 9 June 2016 at 10:09, Gopal Vijayaraghavan  wrote:

>
> > Has anyone done recent load of twitter data into Hive table.
>
> Not anytime recently, but the twitter corpus was heavily used to demo Hive.
>
> Here's the original post on auto-learning schemas from an arbitrary
> collection of JSON docs (like a MongoDB dump).
>
> http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-
> documents/
>
>
> Cheers,
> Gopal
>
>
>


Re: Using Hive table for twitter data

2016-06-09 Thread Gopal Vijayaraghavan

> Has anyone done recent load of twitter data into Hive table.

Not anytime recently, but the twitter corpus was heavily used to demo Hive.

Here's the original post on auto-learning schemas from an arbitrary
collection of JSON docs (like a MongoDB dump).

http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-
documents/


Cheers,
Gopal




Using Hive table for twitter data

2016-06-09 Thread Mich Talebzadeh
Hi,

I am just exploring this.

Has anyone done recent load of twitter data into Hive table.

I used few of them.

This one I tried

ADD JAR /home/hduser/jars/hive-serdes-1.0-SNAPSHOT.jar;
--SET hive.support.sql11.reserved.keywords=false;
use test;
drop table if exists tweets;
CREATE EXTERNAL TABLE tweets (
  id BIGINT,
  created_at STRING,
  source STRING,
  favorited BOOLEAN,
  retweeted_status STRUCT<
text:STRING,
user1:STRUCT,
retweet_count:INT>,
  entities STRUCT<
urls:ARRAY>,
user_mentions:ARRAY>,
hashtags:ARRAY>>,
  text STRING,
  user1 STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
  in_reply_to_screen_name STRING
)
PARTITIONED BY (datehour INT)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/twitter_data'
;

It creates OK but no data is there.

I use Flume to populate that external directory

hdfs dfs -ls /twitter_data
-rw-r--r--   2 hduser supergroup 433868 2016-06-09 09:52
/twitter_data/FlumeData.1465462333430
-rw-r--r--   2 hduser supergroup 438933 2016-06-09 09:53
/twitter_data/FlumeData.1465462365382
-rw-r--r--   2 hduser supergroup 559724 2016-06-09 09:53
/twitter_data/FlumeData.1465462403606
-rw-r--r--   2 hduser supergroup 455594 2016-06-09 09:54
/twitter_data/FlumeData.1465462435124

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com