Re: XMLParsedAsHTMLWarning during import of ofx from ofxget

Colton Crivelli Mon, 06 Jun 2022 10:27:23 -0700

Your question about my platform got me thinking

I setup a new venv using python3.8 (instead of 3.10) and ran without any 
warnings. Haven't looked into why that might be yet.


Some tangential things I've ran into:

   - your very helpful template/reference script here 
   <https://gist.github.com/redstreet/68f8ef59e4532f4de2271402238f370a> runs 
   into a python 3.10 specific deprecation warning mentioned here 
   
<https://docs.python.org/3.10/library/asyncio-eventloop.html#asyncio.get_event_loop>.
 
   They want you to use get_running_loop() instead of get_event_loop(). More 
   discussion here <https://bugs.python.org/issue38599>. I'm not asking for 
   a fix or help here, just sharing
   - the original reason why I moved to python3.10 is because my platform 
   is arm64e/macOS. In short, if you are using smart-importer on arm64e with 
   python 3.8 (or earlier) you'll end up with scikit-learn built for x86 and 
   you'll be unable to import. There's a lot of talk about a way to get an arm 
   build of scikit-learn using conda but it's a pain, would not recommend. 
   Another option is install everything for x86 and use rosetta (e.g. `arch 
   -x86_64 ./import.sh`). The last option is using python3.10 which appears to 
   pull in everything you need to run natively with smart-importer

So I think I have two options, use rosetta and x86 for everything with 
python 3.10 or explore running natively with python 3.10 and getting fixes 
for the python3.10 specific issues.
On Sunday, June 5, 2022 at 10:33:18 PM UTC-7 Red S wrote:

> Hmm, I haven't come across this issue so far.
>
> It's the ofxparse library <https://github.com/jseutter/ofxparse> that 
> uses BS4. I'd ask there. Indeed, they did decide 
> <https://github.com/jseutter/ofxparse/pull/108> to parse this as HTML 
> even though it's XML, but that code has worked fine for years now. What 
> platform are you using?
>
> I'd also consider filtering out via the shell, if everything else works 
> fine:
> bean-extract [blah blah...] 2> >(grep -v XMLParsedAsHTMLWarning >&2)
>
>
> On Sunday, June 5, 2022 at 6:10:35 PM UTC-7 [email protected] wrote:
>
>> Hey all,
>>
>> I'm getting the following warning:
>> venv/lib/python3.10/site-packages/bs4/builder/__init__.py:545: 
>> XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using 
>> an HTML parser. If this really is an HTML document (maybe it's XHTML?), you 
>> can ignore or filter this warning. If it's XML, you should know that using 
>> an XML parser will be more reliable. To parse this document as XML, make 
>> sure you have the lxml package installed, and pass the keyword argument 
>> `features="xml"` into the BeautifulSoup constructor.
>>   warnings.warn(
>>
>> What I'm doing to get this:
>>
>>    - Downloading account data using ofxget as described here 
>>    <https://reds-rants.netlify.app/personal-finance/direct-downloads/>
>>    - Importing that data using beancount-reds-importer (e.g. here 
>>    
>> <https://github.com/redstreet/beancount_reds_importers/blob/main/beancount_reds_importers/chase/__init__.py>
>>    )
>>
>> Things I've tried or discovered:
>>
>>    - I looked for all instances of `soup = BeautifulSoup .. ` and found 
>>    the main calls in ofx.py. I tried changing these calls from feature=lxml 
>> to 
>>    feature=xml which didn't resolve warning
>>    - I made sure lxml is downloaded
>>    - I tried to suppress the warning with a warning.filterwarnings but 
>>    that didn't work either (not sure it would be the "right" thing either)
>>    - I found a PR in an unrelated repo where they solved by suppressing 
>>    here <https://github.com/EnergieID/entsoe-py/issues/180>
>>    - I tried ofx data downloaded from both Fidelity Investments and 
>>    Chase (not expecting this to be institution specific)
>>
>> Questions I have:
>>
>>    - The warning doesn't really help me understand what call into 
>>    BeautifulSoup caused the warning. Any tips on how to track down where the 
>>    issue is coming from? Maybe ofx.py isn't part of the issue at all
>>    - I think bean_extract is still working but any suggestions on if the 
>>    warning should be ignored or resolved would also be appreciated
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/54ee2b54-3106-4b19-882a-84a31d914ccan%40googlegroups.com.

Re: XMLParsedAsHTMLWarning during import of ofx from ofxget

Reply via email to