Nice. Thanks for jotting down the issues on arm64, python3.10, and such!
Sounds to me like you can safely ignore/grep out that warning until it's
fixed upstream. Filing an issue with ofxparse would be good.
Related: 'ofx-summarize' is a command included in beancount-reds-importers
(bleeding edge) that you can use to inspect any ofx, get a quick and dirty
summary, and explore via a pdb shell.
Thanks for the note about asyncio as well. I made the change and cleaned up
the script. It is also now installed as a command ("bean-download") as a
part of beancount-reds-importers (bleeding edge only, for now).
On Monday, June 6, 2022 at 10:27:17 AM UTC-7 [email protected] wrote:
> Your question about my platform got me thinking
>
> I setup a new venv using python3.8 (instead of 3.10) and ran without any
> warnings. Haven't looked into why that might be yet.
>
> Some tangential things I've ran into:
>
> - your very helpful template/reference script here
> <https://gist.github.com/redstreet/68f8ef59e4532f4de2271402238f370a> runs
> into a python 3.10 specific deprecation warning mentioned here
>
> <https://docs.python.org/3.10/library/asyncio-eventloop.html#asyncio.get_event_loop>.
>
> They want you to use get_running_loop() instead of get_event_loop(). More
> discussion here <https://bugs.python.org/issue38599>. I'm not asking
> for a fix or help here, just sharing
> - the original reason why I moved to python3.10 is because my platform
> is arm64e/macOS. In short, if you are using smart-importer on arm64e with
> python 3.8 (or earlier) you'll end up with scikit-learn built for x86 and
> you'll be unable to import. There's a lot of talk about a way to get an
> arm
> build of scikit-learn using conda but it's a pain, would not recommend.
> Another option is install everything for x86 and use rosetta (e.g. `arch
> -x86_64 ./import.sh`). The last option is using python3.10 which appears
> to
> pull in everything you need to run natively with smart-importer
>
> So I think I have two options, use rosetta and x86 for everything with
> python 3.10 or explore running natively with python 3.10 and getting fixes
> for the python3.10 specific issues.
> On Sunday, June 5, 2022 at 10:33:18 PM UTC-7 Red S wrote:
>
>> Hmm, I haven't come across this issue so far.
>>
>> It's the ofxparse library <https://github.com/jseutter/ofxparse> that
>> uses BS4. I'd ask there. Indeed, they did decide
>> <https://github.com/jseutter/ofxparse/pull/108> to parse this as HTML
>> even though it's XML, but that code has worked fine for years now. What
>> platform are you using?
>>
>> I'd also consider filtering out via the shell, if everything else works
>> fine:
>> bean-extract [blah blah...] 2> >(grep -v XMLParsedAsHTMLWarning >&2)
>>
>>
>> On Sunday, June 5, 2022 at 6:10:35 PM UTC-7 [email protected] wrote:
>>
>>> Hey all,
>>>
>>> I'm getting the following warning:
>>> venv/lib/python3.10/site-packages/bs4/builder/__init__.py:545:
>>> XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using
>>> an HTML parser. If this really is an HTML document (maybe it's XHTML?), you
>>> can ignore or filter this warning. If it's XML, you should know that using
>>> an XML parser will be more reliable. To parse this document as XML, make
>>> sure you have the lxml package installed, and pass the keyword argument
>>> `features="xml"` into the BeautifulSoup constructor.
>>> warnings.warn(
>>>
>>> What I'm doing to get this:
>>>
>>> - Downloading account data using ofxget as described here
>>> <https://reds-rants.netlify.app/personal-finance/direct-downloads/>
>>> - Importing that data using beancount-reds-importer (e.g. here
>>>
>>> <https://github.com/redstreet/beancount_reds_importers/blob/main/beancount_reds_importers/chase/__init__.py>
>>> )
>>>
>>> Things I've tried or discovered:
>>>
>>> - I looked for all instances of `soup = BeautifulSoup .. ` and found
>>> the main calls in ofx.py. I tried changing these calls from feature=lxml
>>> to
>>> feature=xml which didn't resolve warning
>>> - I made sure lxml is downloaded
>>> - I tried to suppress the warning with a warning.filterwarnings but
>>> that didn't work either (not sure it would be the "right" thing either)
>>> - I found a PR in an unrelated repo where they solved by suppressing
>>> here <https://github.com/EnergieID/entsoe-py/issues/180>
>>> - I tried ofx data downloaded from both Fidelity Investments and
>>> Chase (not expecting this to be institution specific)
>>>
>>> Questions I have:
>>>
>>> - The warning doesn't really help me understand what call into
>>> BeautifulSoup caused the warning. Any tips on how to track down where
>>> the
>>> issue is coming from? Maybe ofx.py isn't part of the issue at all
>>> - I think bean_extract is still working but any suggestions on if
>>> the warning should be ignored or resolved would also be appreciated
>>>
>>>
>>>
--
You received this message because you are subscribed to the Google Groups
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/beancount/c00152d9-3213-44db-aa75-c79ba7144d81n%40googlegroups.com.