Hello,

I do data analysis on json data (Twitter). An example of the data:

**********
"      \"id\": 433662713886429200,"
"      \"id_str\": \"433662713886429184\","
"      \"text\": \"Hond vast in water in Bargerveen bij Zwartemeer -
http://t.co/FqbkOMzYd1 #Zwartemeer #bargerveen #hond #innood\","
"      \"source\": \"<a href=\"https://about.twitter.com/products/tweetdeck\";
rel=\"nofollow\">TweetDeck</a>\","
**********

I get the contents of the "text" field like this:

r <- regexpr("^( )*\"text(.*?),$", myjsondata)
text <- regmatches(myjsondata,r)
txt <- gsub("\"text\":|\",|\"","",text)

Unfortunately, in json there are more fields with the same name, for
example:

**********
"      \"id\": 433662713886429200,"
"      \"id_str\": \"433662713886429184\","
"      \"text\": \"Hond vast in water in Bargerveen bij Zwartemeer -
http://t.co/FqbkOMzYd1 #Zwartemeer #bargerveen #hond #innood\","
"      \"source\": \"<a href=\"https://about.twitter.com/products/tweetdeck\";
rel=\"nofollow\">TweetDeck</a>\","
...
"      \"entities\":  {"


"        \"hashtags\":  ["


"           {"


"            \"text\": \"Zwartemeer\","
...
"            \"text\": \"bargerveen\","


...
"            \"text\": \"hond\","
etc.
**********

I only want to get the data from the text field between the "id_str" and
the "source" fields. I don't want to have the data from the text fields
below "hashtags". I do understand regex, but I don't understand how to do
it with the criteria from multiple lines.

I know it's possible to use a Json library in R, but in my case I can't,
because I get the json from raw "clipboard" data.

Thanks !

Mark Stam

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to