I am doing a project which will save the html source of the targeted
page and I used one following code to extract the html source by
serializing the DOM into a string. Does anybody know of a better way to
do it?
nsCOMPtr<nsIDOMWindow> window;
nsresult rv = aWebProgress->GetDOMWindow(getter_AddRefs(window));
if(window)
{
nsCOMPtr<nsIDOMDocument> domDoc;
rv = window->GetDocument(getter_AddRefs(domDoc));
if(domDoc)
{
/*
Creating a DOM serializer to get the html page source
*/
nsCOMPtr<nsIDOMSerializer> serializer;
nsCOMPtr<nsIComponentManager> compMgr;
rv = NS_GetComponentManager(getter_AddRefs(compMgr));
rv =
compMgr->CreateInstanceByContractID(NS_XMLSERIALIZER_CONTRACTID,
NULL, NS_GET_IID(nsIDOMSerializer),getter_AddRefs(serializer));
nsEmbedString htmlSource;
nsEmbedString strTitle;
nsEmbedString strURL;
rv = serializer->SerializeToString(domDoc, htmlSource);
}
}
I found out that the serialized source is different than what you can
actually copy out from "View Page Source". Sometimes the serialized
got truncated randomly or added some extra junk, resulting in invalid
html code and therefore it can not display correctly. Has anybody
experienced this problem before? I have tested against the page of
Yahoo mail and Hotmail that displays a particular email in inbox. There
is no discrepancy with simple html page though. Please advice what I
can investigate to solve the problem. Thanks
Paulino
_______________________________________________
dev-tech-xpcom mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-xpcom