Re: [Jprogramming] Downloading using https protocol

Thomas McGuire Tue, 05 Jan 2021 02:16:22 -0800

I believe this has to do with a crumb that yahoo embeds in the session and 
unless your reply has the crumb yahoo rejects the query. It means you need a 2 
step call access. One to set the session information and extract the crumb and 
one to actually get the data you are interested in.


I had played with this using web/gethttp and have some code to find the crumb. 
My code is over a year old and I haven’t tried it on J902 or above. But I have 
attached it for you to see if it helps with your problem

Tom McGuire

NB. OK so here is my final code cleaned up and now working due to the double 
quote issue (see second to last line of code):

NB. Navigating yahoo.com to programmatically get historical stock prices
NB.
require 'web/gethttp'
require 'regex'

NB. use the linux date command to create a linux time stamp
epochtime =: 3 : 0
2!:0 'date -jf ''%m/%d/%Y %H:%M:%S %p'' ''',y,' 05:00:00 PM'' ''+%s'''
)

NB. precision functions
ppq =: 9 !: 10 NB. print-precision query
pps =: 9 !: 11 NB. print-precision set
NB. I set the precision to 16 to ensure full printing of the linux timestamps

NB. Conversion of \u00xx escape sequences
HEX=:16#.'0123456789abcdef'i.]

xutf =: 3 : 0
u: HEX tolower 2 }. y
)

crumbstr =: '"CrumbStore":{"crumb":"'
NB. the crumb is on the page with the link to downloading the historical
NB. data. If you call the correct first page you only need to search
NB. for the above crumbstr there will be only one.
getcrumb =: 3 : 0
NB. find the start index and end index of the crumb
sidx =. (#crumbstr)+({: I. crumbstr E. y)
sstr =. (sidx + i. 30){y
eidx =. {. I. '"' E. sstr

NB. using rxapply convert all \u00xx unicode escape sequences
crumb =. '(\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])' xutf rxapply 
(i.eidx){sstr
)

financeURL =: 'https://finance.yahoo.com/quote/' NB. AAPL/history?p=AAPL'
histURL =: 'https://query1.finance.yahoo.com/v7/finance/download/'
NB. the histURL needs to have a ticker symbol followed by:
NB. ?period1=<unixts for p1>&period2=<unixts for 
p2&interval=1d&events=history&crumb=<crumbval>
NB.
NB. here is a full fledged quote request from the website itself for Apple 
Computer
NB. 
https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1543024670&period2=1574560670&interval=1d&events=history&crumb=jZO816Y7CSK

gethistorical=: 3 : 0
'symbol d1 d2' =. y

NB. Create start URL for the start page with the crumb to get historical 
download
NB. a BASH implementation uses the following format:
NB. sURL =. financeURL,symbol,'/?p=',symbol

NB. But the link to the download of historical prices is:
sURL =. financeURL,symbol,'/history?p=',symbol

NB. Get the response using gethttp. -c cookie.txt will open a cookie file
res =. '-s -c cookie.txt' gethttp sURL

crumb =. getcrumb res

qstr =. '?period1=',(}:epochtime d1),'&period2=',(}:epochtime 
d2),'&interval=1d&events=history&crumb=',crumb
URL=. histURL,symbol,qstr

NB. turns out that to get a file download you need to double quote the URL
NB. There is a built in function for that in J
res2 =. '-s -b cookie.txt ' gethttp dquote URL
res2
)


> On Jan 5, 2021, at 12:37 AM, Devon McCormick <devon...@gmail.com> wrote:
> 
> This is not really a J question, but has anyone successfully figured out
> how to download something using https protocol?  It's actually more
> complicated than this.  If I want to get, say, the price and volume history
> for Tesla from Yahoo Finance, the "Download" command there generates a
> string like this:
> https://query1.finance.yahoo.com/v7/finance/download/TSLA?period1=1277769600&period2=1609718400&interval=1d&events=history&includeAdjustedClose=true
> 
> I used to be able to take this string and substitute into it to download
> not only prices for TSLA but any other stock on Yahoo Finance for which I
> knew the ticker.  Now this sort of thing fails when I use my former method
> which was just invoking "wget" via the "shell" command in J.  Instead I get
> an 800+ byte html file with error messages.  I think the switch to https
> from http is to blame but having a query string instead of a filename may
> also be an issue.
> 
> The "wget" method still works for http files like this (to get the famous
> iris data): shell 'wget -O iris.data
> http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'.
> So, I suspect it's probably https that is to blame but I do not have a
> working example of submitting a query string using http so I cannot be
> completely sure other than I think this used to work.
> 
> Any suggestions for an automatable way to do this would be welcome.
> 
> Thanks,
> 
> Devon
> -- 
> 
> Devon McCormick, CFA
> 
> Quantitative Consultant
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Downloading using https protocol

Reply via email to