Your question about my platform got me thinking I setup a new venv using python3.8 (instead of 3.10) and ran without any warnings. Haven't looked into why that might be yet.
Some tangential things I've ran into: - your very helpful template/reference script here <https://gist.github.com/redstreet/68f8ef59e4532f4de2271402238f370a> runs into a python 3.10 specific deprecation warning mentioned here <https://docs.python.org/3.10/library/asyncio-eventloop.html#asyncio.get_event_loop>. They want you to use get_running_loop() instead of get_event_loop(). More discussion here <https://bugs.python.org/issue38599>. I'm not asking for a fix or help here, just sharing - the original reason why I moved to python3.10 is because my platform is arm64e/macOS. In short, if you are using smart-importer on arm64e with python 3.8 (or earlier) you'll end up with scikit-learn built for x86 and you'll be unable to import. There's a lot of talk about a way to get an arm build of scikit-learn using conda but it's a pain, would not recommend. Another option is install everything for x86 and use rosetta (e.g. `arch -x86_64 ./import.sh`). The last option is using python3.10 which appears to pull in everything you need to run natively with smart-importer So I think I have two options, use rosetta and x86 for everything with python 3.10 or explore running natively with python 3.10 and getting fixes for the python3.10 specific issues. On Sunday, June 5, 2022 at 10:33:18 PM UTC-7 Red S wrote: > Hmm, I haven't come across this issue so far. > > It's the ofxparse library <https://github.com/jseutter/ofxparse> that > uses BS4. I'd ask there. Indeed, they did decide > <https://github.com/jseutter/ofxparse/pull/108> to parse this as HTML > even though it's XML, but that code has worked fine for years now. What > platform are you using? > > I'd also consider filtering out via the shell, if everything else works > fine: > bean-extract [blah blah...] 2> >(grep -v XMLParsedAsHTMLWarning >&2) > > > On Sunday, June 5, 2022 at 6:10:35 PM UTC-7 [email protected] wrote: > >> Hey all, >> >> I'm getting the following warning: >> venv/lib/python3.10/site-packages/bs4/builder/__init__.py:545: >> XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using >> an HTML parser. If this really is an HTML document (maybe it's XHTML?), you >> can ignore or filter this warning. If it's XML, you should know that using >> an XML parser will be more reliable. To parse this document as XML, make >> sure you have the lxml package installed, and pass the keyword argument >> `features="xml"` into the BeautifulSoup constructor. >> warnings.warn( >> >> What I'm doing to get this: >> >> - Downloading account data using ofxget as described here >> <https://reds-rants.netlify.app/personal-finance/direct-downloads/> >> - Importing that data using beancount-reds-importer (e.g. here >> >> <https://github.com/redstreet/beancount_reds_importers/blob/main/beancount_reds_importers/chase/__init__.py> >> ) >> >> Things I've tried or discovered: >> >> - I looked for all instances of `soup = BeautifulSoup .. ` and found >> the main calls in ofx.py. I tried changing these calls from feature=lxml >> to >> feature=xml which didn't resolve warning >> - I made sure lxml is downloaded >> - I tried to suppress the warning with a warning.filterwarnings but >> that didn't work either (not sure it would be the "right" thing either) >> - I found a PR in an unrelated repo where they solved by suppressing >> here <https://github.com/EnergieID/entsoe-py/issues/180> >> - I tried ofx data downloaded from both Fidelity Investments and >> Chase (not expecting this to be institution specific) >> >> Questions I have: >> >> - The warning doesn't really help me understand what call into >> BeautifulSoup caused the warning. Any tips on how to track down where the >> issue is coming from? Maybe ofx.py isn't part of the issue at all >> - I think bean_extract is still working but any suggestions on if the >> warning should be ignored or resolved would also be appreciated >> >> >> -- You received this message because you are subscribed to the Google Groups "Beancount" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/54ee2b54-3106-4b19-882a-84a31d914ccan%40googlegroups.com.
