Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
On Tue, 28 May 2013 08:31:35 +0100, Fábio Santos wrote: > On 28 May 2013 04:19, "Bryan Britten" wrote: >> I'm not familiar with using read(4096), I'll have to look into that. >> When > I tried to just save the file, my computer just sat in limbo for some > time and didn't seem to want to process the command. > > That's just file.read with an integer argument. You can read a file by > chunks by repeatedly calling that function until you get the empty > string. > >> Based on my *extremely* limited knowledge of JSON, that's definitely >> the > type of file this is. Here is a snippet of what is seen when you log in: > ... > That's json. It's pretty big, but not big enough to stall a slow > computer more than half a second. > > - > > I've looked for documentation on that method on twitter. > > It seems that it's part of the twitter streaming api. > > https://dev.twitter.com/docs/streaming-apis > > What this means is that the requests aren't supposed to end. They are > supposed to be read gradually, using the lines to split the response > into meaningful chunks. That's why you can't read the data and why your > browser never gets around to download it. Both urlopen and your browser > block while waiting for the request to end. Are we overlooking the obvious why not use one of the Python twitter modules to isolate your app from the nitty-gritty details of the twitter stream https://dev.twitter.com/docs/twitter-libraries -- Given sufficient time, what you put off doing today will get done by itself. -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
Thanks to everyone for the help and insight. I think for now I'll just back away from this file and go back to something much easier to practice with. -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
On 28 May 2013 04:19, "Bryan Britten" wrote: > I'm not familiar with using read(4096), I'll have to look into that. When I tried to just save the file, my computer just sat in limbo for some time and didn't seem to want to process the command. That's just file.read with an integer argument. You can read a file by chunks by repeatedly calling that function until you get the empty string. > Based on my *extremely* limited knowledge of JSON, that's definitely the type of file this is. Here is a snippet of what is seen when you log in: ... That's json. It's pretty big, but not big enough to stall a slow computer more than half a second. - I've looked for documentation on that method on twitter. It seems that it's part of the twitter streaming api. https://dev.twitter.com/docs/streaming-apis What this means is that the requests aren't supposed to end. They are supposed to be read gradually, using the lines to split the response into meaningful chunks. That's why you can't read the data and why your browser never gets around to download it. Both urlopen and your browser block while waiting for the request to end. Here's more info on streaming requests on their docs: https://dev.twitter.com/docs/streaming-apis/processing For streaming requests in python, I would point you to the requests library, but I am not sure it handles streaming requests. -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
On Monday, May 27, 2013 7:58:05 PM UTC-4, Dave Angel wrote: > On 05/27/2013 04:47 PM, Bryan Britten wrote: > > > Hey, everyone! > > > > > > I'm very new to Python and have only been using it for a couple of days, > > but have some experience in programming (albeit mostly statistical > > programming in SAS or R) so I'm hoping someone can answer this question in > > a technical way, but without using an abundant amount of jargon. > > > > > > The issue I'm having is that I'm trying to pull information from a website > > to practice Python with, but I'm having trouble getting the data in a > > timely fashion. If I use the following code: > > > > > > > > > import json > > > import urllib > > > > > > urlStr = "https://stream.twitter.com/1/statuses/sample.json"; > > > > > > twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)] > > > > > > > > > I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if > > that helps at all. > > > > Which OS? I'm operating on Windows 7. > > The first question I'd ask is how big this file is. I can't tell, since > > it needs a user name & password to actually get the file. If you have Twitter, you can just use your log-in information to access the file. > But it's not unusual to need at least double that space in memory, and in > Windoze > > you're limited to two gig max, regardless of how big your hardware might be. > > > > If you separately fetch the file, then you can experiment with it, > > including cutting it down to a dozen lines, and see if you can deal with > > that much. > > > > How could you fetch it? With wget, with a browser (and saveAs), with a > > simple loop which uses read(4096) repeatedly and writes each block to a > > local file. Don't forget to use 'wb', as you don't know yet what line > > endings it might use. > I'm not familiar with using read(4096), I'll have to look into that. When I tried to just save the file, my computer just sat in limbo for some time and didn't seem to want to process the command. > > Once you have an idea what the data looks like, you can answer such > > questions as whether it's json at all, whether the lines each contain a > > single json record, or what. > Based on my *extremely* limited knowledge of JSON, that's definitely the type of file this is. Here is a snippet of what is seen when you log in: {"created_at":"Tue May 28 03:09:23 + 2013","id":339216806461972481,"id_str":"339216806461972481","text":"RT @aleon_11: Sigo creyendo que las noches lluviosas me acercan mucho m\u00e1s a ti!","source":"\u003ca href=\"http:\/\/blackberry.com\/twitter\" rel=\"nofollow\"\u003eTwitter for BlackBerry\u00ae\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":310910123,"id_str":"310910123","name":"\u2661","screen_name":"LaMarielita_","location":"","url":null,"description":"MERCADOLOGA & PUBLICISTA EN PROCESO, AMO A MI DIOS & MI FAMILIA\u2665 ME ENCANTA REIRME , MOLESTAR & HABLAR :D BFF, pancho, ale & china :) LY\u2661","protected":false,"followers_count":506,"friends_count":606,"listed_count":1,"created_at":"Sat Jun 04 15:24:19 + 2011","favourites_count":207,"utc_offset":-25200,"time_zone":"Mountain Time (US & Canada)","geo_enabled":false," verified":false,"statuses_count":17241,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"FF6699","profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme11\/bg.gif","profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme11\/bg.gif","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3720425493\/13a48910e56ca34edeea07ff04075c77_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3720425493\/13a48910e56ca34edeea07ff04075c77_normal.jpeg","profile_link_color":"B40B43","profile_sidebar_border_color":"CC3366","profile_sidebar_fill_color":"E5507E","profile_text_color":"362720","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Tue May 2 8 02:57:40 + 2013","id":339213856922537984,"id_str":"339213856922537984","text":"Sigo creyendo que las noches lluviosas me acercan mucho m\u00e1s a ti!","source":"web","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":105252134,"id_str":"105252134","name":"Alejandra Le\u00f3n","screen_name":"aleon_11","location":"Guatemala","url":null,"description":"La vida se disfruta m\u00e1s, cuando no se le pone tanta importancia.","protecte
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
On 05/27/2013 04:47 PM, Bryan Britten wrote: Hey, everyone! I'm very new to Python and have only been using it for a couple of days, but have some experience in programming (albeit mostly statistical programming in SAS or R) so I'm hoping someone can answer this question in a technical way, but without using an abundant amount of jargon. The issue I'm having is that I'm trying to pull information from a website to practice Python with, but I'm having trouble getting the data in a timely fashion. If I use the following code: import json import urllib urlStr = "https://stream.twitter.com/1/statuses/sample.json"; twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)] I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if that helps at all. Which OS? The first question I'd ask is how big this file is. I can't tell, since it needs a user name & password to actually get the file. But it's not unusual to need at least double that space in memory, and in Windoze you're limited to two gig max, regardless of how big your hardware might be. If you separately fetch the file, then you can experiment with it, including cutting it down to a dozen lines, and see if you can deal with that much. How could you fetch it? With wget, with a browser (and saveAs), with a simple loop which uses read(4096) repeatedly and writes each block to a local file. Don't forget to use 'wb', as you don't know yet what line endings it might use. Once you have an idea what the data looks like, you can answer such questions as whether it's json at all, whether the lines each contain a single json record, or what. For all we know, the file might be a few terabytes in size. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
On 27 May 2013 22:36, "Bryan Britten" wrote: > > Try to not sigh audibly as I ask what I'm sure are two asinine questions. > > 1) How is this approach different from twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)]? > The suggested approach made use of generators. Just because you can iterate over something, that doesn't mean it is all in memory ;) Check out the difference between range() and xrange() in python 2 -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
On Mon, 27 May 2013 14:29:38 -0700, Bryan Britten wrote: > Try to not sigh audibly as I ask what I'm sure are two asinine > questions. > > 1) How is this approach different from twtrDict = [json.loads(line) for > line in urllib.urlopen(urlStr)]? > > 2) How do I tell how many JSON objects are on each line? Your code at (1) creates a single list of all the json objects The code you replied to loaded each object, assumed you did something with it, and then over-wrote it with the next one. As for (2) - either inspection, or errors from the json parser. -- Denis McMahon, denismfmcma...@gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
Try to not sigh audibly as I ask what I'm sure are two asinine questions. 1) How is this approach different from twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)]? 2) How do I tell how many JSON objects are on each line? -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
In article <10be5c62-4c58-4b4f-b00a-82d85ee4e...@googlegroups.com>, Bryan Britten wrote: > If I use the following code: > > > import urllib > > urlStr = "https://stream.twitter.com/1/statuses/sample.json"; > > fileHandle = urllib.urlopen(urlStr) > > twtrText = fileHandle.readlines() > > > It takes hours (upwards of 6 or 7, if not more) to finish computing the last > command. I'm not surprised! readlines() reads in the ENTIRE file in one gulp. That a lot of tweets! > With that being said, my question is whether there is a more efficient manner > to do this. In general, when reading a large file, you want to iterate over lines of the file and process each one. Something like: for line in urllib.urlopen(urlStr): twtrDict = json.loads(line) You still need to download and process all the data, but at least you don't need to store it in memory all at once. There is an assumption here that there's exactly one json object per line. If that's not the case, things might get a little more complicated. -- http://mail.python.org/mailman/listinfo/python-list
Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
Hey, everyone! I'm very new to Python and have only been using it for a couple of days, but have some experience in programming (albeit mostly statistical programming in SAS or R) so I'm hoping someone can answer this question in a technical way, but without using an abundant amount of jargon. The issue I'm having is that I'm trying to pull information from a website to practice Python with, but I'm having trouble getting the data in a timely fashion. If I use the following code: import json import urllib urlStr = "https://stream.twitter.com/1/statuses/sample.json"; twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)] I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if that helps at all. If I use the following code: import urllib urlStr = "https://stream.twitter.com/1/statuses/sample.json"; fileHandle = urllib.urlopen(urlStr) twtrText = fileHandle.readlines() It takes hours (upwards of 6 or 7, if not more) to finish computing the last command. With that being said, my question is whether there is a more efficient manner to do this. I'm worried that if it's taking this long to process the .readlines() command, trying to work with the data is going to be a computational nightmare. Thanks in advance for any insights or advice! -- http://mail.python.org/mailman/listinfo/python-list