-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael,
I just downloaded the latest neo4j-shell-tools and an export got: <node id="n1" labels="User,expertValidation,SeedNode" ><data key="labels">User,expertValidation,SeedNode</data><data key="id_str">269740110</data><data key="name">Andreyana Ivanova</data><data key="screen_name">adiivanova</data><data key="description">Passionate and inspiring Equality & Diversity Practitioner</data><data key="followers_count">35</data><data key="friends_count">52</data><data key="listed_count">2</data><data key="statuses_count">11</data><data key="favourites_count">0</data><data key="location">London</data><data key="time_zone">London</data><data key="utc_offset">0</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/1281406234/IMG_5547_normal.JPG</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data></node> Between "Equality & Divesity" is an example of a & that should be written as: &" Sorry for not checking my email earlier but I wanted to create a file with several examples (I have inserted XML comments for each example) of what markup errors I am picking up. BTW, there is a edge case where < and > should not be converted but that is in XML processing instructions and it is unlikely anyone will be encountering those in an graph database. As soon as I hit "send" an example case will hit the email list. ;-) It has been a while since I have looked at conversion filter libraries but I suspect there is something that would correct the character and markup errors automatically on export. HTML tidy I think has that capacity. Anyway file attached. Hope you are having a great week! Patrick On 01/20/2014 10:33 PM, Michael Hunger wrote: > Patrick, > > the xml encoding issues for <> & etc. should be addressed. > > Not sure how do deal with the control characters though. The only > thing I could think of is to write data as CDATA fields? > > Or strip them somehow upfront. > > Michael > > Am 21.01.2014 um 01:34 schrieb Patrick Durusau > <patr...@durusau.net>: > > Michael, > > On 01/20/2014 02:48 AM, Michael Hunger wrote: >>>> Thanks for the feedback, will fix these issues. >>>> >>>> Do you know where the control characters came from? >>>> > > Guessing I would say that Twitter accepts pasted content. Works ok > as long as you are in the lower ASCII set but for things like > trademark (tm) and the R with a circle? Sorry, I'm real tired. > > I still have an uncorrected version of the data and will try to > fish out the lines in question. The full file is large in email > terms and most of it would not be helpful. > > I'll get some sleep and look at it in the morning. > > Will verify the problems still exist in Gephi as well with the > much smaller version of the file. > > Hope you are having a great day! > > Patrick > > >>>> Michael >>>> >>>> Am 20.01.2014 um 02:59 schrieb Patrick Durusau >>>> <patr...@durusau.net>: >>>> >>>> Michael, >>>> >>>> I tried out the export to GraphML today. >>>> >>>> I was using data from a Twitter feed. >>>> >>>> The first issue on trying to load into GraphML was that the >>>> "&" character was not written "&" >>>> >>>> When converting files to XML, escape "&" with "&", "<" as >>>> < and ">" as > >>>> >>>> The next several issues were control characters ^B, ^C, etc. >>>> embedded before TM and R, etc. >>>> >>>> Conversion to UTF-8 and stripping anything that doesn't >>>> convert would be nice. >>>> >>>> The parser in my Emacs must not match what is being used in >>>> Gephi because it would choke even though Emacs said all was >>>> well. >>>> >>>> Hope you are at the start of a great week! >>>> >>>> Patrick >>>> >>>> On 01/17/2014 07:49 PM, Michael Hunger wrote: >>>>>>> +1 that would be awesome >>>>>>> >>>>>>> I wanted to give it a try myself but haven't found the >>>>>>> time. >>>>>>> >>>>>>> Btw. my neo4j-shell-tools now export Neo4j to GraphML, >>>>>>> so you can visualize your db in Gephi, would love some >>>>>>> feedback: >>>>>>> >>>>>>> https://github.com/jexp/neo4j-shell-tools/tree/20#graphml-export >>>>>>> >>>>>>> >>>>>>> > >>>>>>> Michael >>>>>>> >>>>>>> Am 18.01.2014 um 00:29 schrieb Marcelo Gagliano >>>>>>> <marcelo.gagli...@gmail.com >>>>>>> <mailto:marcelo.gagli...@gmail.com>>: >>>>>>> >>>>>>>> Hi, Caleb. >>>>>>>> >>>>>>>> Did you developed that client? If so, could you share >>>>>>>> the source code? I am trying to create a similar >>>>>>>> solution, but I am not having much success. >>>>>>>> >>>>>>>> Thank you, Marcelo Gagliano >>>>>>>> >>>>>>>> >>>>>>>> On Friday, June 21, 2013 3:12:20 AM UTC-3, Caleb >>>>>>>> Jones wrote: >>>>>>>> >>>>>>>> I'm currently working on building a Java client for >>>>>>>> the Gephi streaming API and will be presenting at the >>>>>>>> Seattle Graph Meetup group. I'm aware of the Neo4j >>>>>>>> Gephi plugin >>>>>>>> (https://marketplace.gephi.org/plugin/neo4j-graph-database-support/ >>>>>>>> >>>>>>>> > >>>>>>>> <https://marketplace.gephi.org/plugin/neo4j-graph-database-support/>) >>>>>>>> >>>>>>>> >>>> and plan on showing how that can be used too, but I'm curious >>>> if >>>>>>>> there are any specific Neo4j applications that would >>>>>>>> fit well with the streaming work I'm doing. >>>>>>>> >>>>>>>> One thought is to have a mode in the streaming client >>>>>>>> I'm writing that tees the streaming to both Gephi and >>>>>>>> Neo4j. Of course, someone could just stream to Gephi >>>>>>>> then export to Neo4j as well. >>>>>>>> >>>>>>>> I'm not drowning in free time to do this, so I'm >>>>>>>> looking for simple integrations to do. >>>>>>>> >>>>>>>> Thoughts? >>>>>>>> >>>>>>>> >>>>>>>> -- You received this message because you are >>>>>>>> subscribed to the Google Groups "Neo4j" group. To >>>>>>>> unsubscribe from this group and stop receiving emails >>>>>>>> from it, send an email to >>>>>>>> neo4j+unsubscr...@googlegroups.com >>>>>>>> <mailto:neo4j+unsubscr...@googlegroups.com>. For >>>>>>>> more options, visit >>>>>>>> https://groups.google.com/groups/opt_out. >>>>>>> >>>>>>> -- You received this message because you are subscribed >>>>>>> to the Google Groups "Neo4j" group. To unsubscribe from >>>>>>> this group and stop receiving emails from it, send an >>>>>>> email to neo4j+unsubscr...@googlegroups.com. For more >>>>>>> options, visit >>>>>>> https://groups.google.com/groups/opt_out. >>>> >>>>> >>>>> -- You received this message because you are subscribed to >>>>> the Google Groups "Neo4j" group. To unsubscribe from this >>>>> group and stop receiving emails from it, send an email to >>>>> neo4j+unsubscr...@googlegroups.com. For more options, >>>>> visit https://groups.google.com/groups/opt_out. >>>> > >> >> -- You received this message because you are subscribed to the >> Google Groups "Neo4j" group. To unsubscribe from this group and >> stop receiving emails from it, send an email to >> neo4j+unsubscr...@googlegroups.com. For more options, visit >> https://groups.google.com/groups/opt_out. > - -- Patrick Durusau patr...@durusau.net Technical Advisory Board, OASIS (TAB) Co-Chair, OpenDocument Format TC (OASIS) Editor, OpenDocument Format TC, Project Editor ISO/IEC 26300 Former Chair, V1 - US TAG to JTC 1/SC 34 Convener, JTC 1/SC 34/WG 3 (Topic Maps) Co-Editor, ISO 13250-5 (Topic Maps) Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS3sUXAAoJEAudyeI2QFGo05wP/jO9MqO1iy2XV8CVPMGStxjz nSlcY2CPLht1IS4IO/wg/4Xh7Bilk0h86lJ5V7cllmidQCl97h/OtcrUKLlddKo7 Bc15XDkq0cMRBr39OWbyQH/wblozTiyqEI9mBhyczROcY2vHomFGyBgbZazSM092 DOMMyq1rVvm5tArxiyMHbcDNob4xuxLPBBQtPfOoPRQXOHpGt+O92hg88lI4uN0c jfcyVHEVMVg/gP4vspJpc3cFIbBCAmPKdUe2fWZYjLQAoWXqqJ6MTH44GwzjqmDJ Fp2Qy0jHpVyVY0wYLcjJ+xvNrL1sXjCgBJsLP7Dt0innGYLzY9K1K6rC96yU0SUV KpGE6kAvKdbJB1O3TTWXL/iIVpyeqQUa7wfKimYD751ZxADbybD20tXNDV1FUI2r AezukYK3QBbB0sgKP2JU0HVV9JVLBQ28y2IK925yDStoqOV/b9qnQQg78Kvpzuf0 GmJU/0XdqdPumytRRlk6st5Tmd/qRku6Zy0fjNACUFB8UhmgeVATZn/ZgNnxoISH BsCid5sjpISzbldp4qhvHrWsqGHH5YDnRBcIaELq7I9EeW42gl0Zc36BD2SONbSm Okg5hThxEc72G9YYW7E6ys05OZZ/zpeATdWirKYnw+AaiKVVRGcBmK1nlulMmr/8 QOCDj1ZL5cgdiTrm4pM3 =Bxn1 -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
<?xml version="1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <graph id="G" edgedefault="directed"> <!-- Node n1 example of "&" not being written as "&" --> <node id="n1" labels="User,expertValidation,SeedNode" ><data key="labels">User,expertValidation,SeedNode</data><data key="id_str">269740110</data><data key="name">Andreyana Ivanova</data><data key="screen_name">adiivanova</data><data key="description">Passionate and inspiring Equality & Diversity Practitioner</data><data key="followers_count">35</data><data key="friends_count">52</data><data key="listed_count">2</data><data key="statuses_count">11</data><data key="favourites_count">0</data><data key="location">London</data><data key="time_zone">London</data><data key="utc_offset">0</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/1281406234/IMG_5547_normal.JPG</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data></node> <!-- Node n6 example of "&" not being written as "&" --> <node id="n6" labels="User" ><data key="labels">User</data><data key="id_str">20646711</data><data key="name">UNESCO</data><data key="screen_name">UNESCO</data><data key="description">Building peace in the minds of men & women: Official Twitter of the United Nations Educational, Scientific & Cultural Organization</data><data key="followers_count">253062</data><data key="friends_count">751</data><data key="listed_count">3569</data><data key="statuses_count">6664</data><data key="favourites_count">2984</data><data key="location"></data><data key="time_zone">Paris</data><data key="utc_offset">3600</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/2328492994/vrbkme9mbw8ospnkvbkv_normal.jpeg</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/5TWGcZpSBU</data></node> <!-- Node n16, see "non SGML character number 128_" immediately following "BBSeedfund" <node id="n16" labels="User" ><data key="labels">User</data><data key="id_str">18039181</data><data key="name">Mark Clayton Hand</data><data key="screen_name">markchand</data><data key="description">Investing in #Oxford #startups via @SBSSeedfund • Looking into mesh networks, the US Latino market, and new models of news content delivery</data><data key="followers_count">501</data><data key="friends_count">171</data><data key="listed_count">14</data><data key="statuses_count">1108</data><data key="favourites_count">7</data><data key="location">Oxford, UK</data><data key="time_zone">Mumbai</data><data key="utc_offset">19800</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/3751715890/6defbcd4b96eacaf7802c786f804d9d1_normal.jpeg</data><data key="geo_enabled">true</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/oDnPvbzQp7</data></node> <!-- Node n23 - A different from of the "&" error. Here the parser thinks the entity is P, in A&P without a closing delimiter --> <node id="n23" labels="User" ><data key="labels">User</data><data key="id_str">243534060</data><data key="name">Susan Mashibe</data><data key="screen_name">iMashibe</data><data key="description">BizAv Enthusiast, YGL 2011, Fortune MPW Mentee 2011, Archbishop Tutu Fellow 2009, FAA CPL and A&P Holder, etc....</data><data key="followers_count">3027</data><data key="friends_count">826</data><data key="listed_count">31</data><data key="statuses_count">4482</data><data key="favourites_count">1017</data><data key="location">Tanzania</data><data key="time_zone">Nairobi</data><data key="utc_offset">10800</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/3046803908/37e5f73e6ae5f4fb03c64eb8320c77f6_normal.jpeg</data><data key="geo_enabled">true</data><data key="verified">false</data><data key="notifications">false</data></node> <!-- Node n26 - "ÜT" non SGML character number 156 - special ASCII character --> <node id="n26" labels="User" ><data key="labels">User</data><data key="id_str">2513671</data><data key="name">Katharina Borchert</data><data key="screen_name">lyssaslounge</data><data key="description">CEO of SPIEGEL Online. Eclectic mix of business and pleasure. No ghostwriters harmed in the making of these tweets.</data><data key="followers_count">13678</data><data key="friends_count">1641</data><data key="listed_count">650</data><data key="statuses_count">9925</data><data key="favourites_count">967</data><data key="location">ÜT: 51.450038,6.802151</data><data key="time_zone">Berlin</data><data key="utc_offset">3600</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/30350792/lyssa300dpi_normal.jpg</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/yT4udXvp3y</data></node> <!-- Node n35 - ≠ non SGML character number 137 --> <node id="n35" labels="User" ><data key="labels">User</data><data key="id_str">22947931</data><data key="name">Sasha Rabsey</data><data key="screen_name">howfund</data><data key="description">Manager of How Fund, Interested in development. Involved with grassroots groups working with adolescent girls and young women. RT ≠ endorsement</data><data key="followers_count">698</data><data key="friends_count">847</data><data key="listed_count">16</data><data key="statuses_count">9199</data><data key="favourites_count">20</data><data key="location">California</data><data key="time_zone">Pacific Time (US & Canada)</data><data key="utc_offset">-28800</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/1232822482/IMG_0026_normal.JPG</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/ljxsiUjBZa</data></node> <!-- Node n58 - ™ non SGML character number 132 --> <node id="n58" labels="User" ><data key="labels">User</data><data key="id_str">1310445529</data><data key="name">TBLI CONFERENCE™</data><data key="screen_name">tbli_conference</data><data key="description">In existence for over 16 years, TBLI CONFERENCE™ is the prime annual global networking and learning event on ESG and Impact Investing.</data><data key="followers_count">237</data><data key="friends_count">717</data><data key="listed_count">6</data><data key="statuses_count">929</data><data key="favourites_count">1</data><data key="location">Amsterdam</data><data key="time_zone">Amsterdam</data><data key="utc_offset">3600</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/3442283153/7e19e92f0ec141c3c77a3b8a9976a2f6_normal.jpeg</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/WftLnqfPmM</data></node> <!-- Node n60 - As exported from Neo4j, all of the foreign characters were in special ASCII code, I corrected this block by saving to UTF-8 in Emacs and it did the conversion --> <node id="n60" labels="User" ><data key="labels">User</data><data key="id_str">400382103</data><data key="name">이재웅 (Jaewoong Lee)</data><data key="screen_name">soventure</data><data key="description">소셜벤처, 인큐베이팅, 벤처, 새경제, 앙트르프르눠십, 바람직한 기업지배구조, 다양성, 제주, 양성평등, 창조적혁신, 협력적소비, 집단지성, 공유경제. 요즘은 소셜벤처인큐베이터 에스오피오오엔지 @sopoong 에서 일한답니다.</data><data key="followers_count">8198</data><data key="friends_count">148</data><data key="listed_count">356</data><data key="statuses_count">2887</data><data key="favourites_count">1909</data><data key="location">제주, 서울, 그리고 넷</data><data key="time_zone">Irkutsk</data><data key="utc_offset">32400</data><data key="lang">ko</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/1611397437/image_normal.jpg</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data></node> <!-- Node n74 - Another example of the "&" being mistaken for the start of an entity, here precedes K --> <node id="n74" labels="User" ><data key="labels">User</data><data key="id_str">16890791</data><data key="name">Dhruv Lakra</data><data key="screen_name">dhruvlakra3</data><data key="description">National Award Winner, Highest Civilian Award from J&K govt., Echoing Green Fellow, Mumbai Hero Award, Helen Keller Awardee for @miraklecouriers. Give us a try.</data><data key="followers_count">1627</data><data key="friends_count">388</data><data key="listed_count">50</data><data key="statuses_count">6106</data><data key="favourites_count">108</data><data key="location">Mumbai</data><data key="time_zone">Mumbai</data><data key="utc_offset">19800</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/378800000644652078/17be03e6711b1eb4f36a2c95c95dc3a9_normal.jpeg</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/vks8gGF8nn</data></node> <!-- Node n94 - The apostrophe in world's is a "smart" character, not the standard ASCII btw, my parser says the "s" following the apostrophe is ASCII 153, Windows char set? --> <node id="n94" labels="User" ><data key="labels">User</data><data key="id_str">19313711</data><data key="name">Skoll World Forum</data><data key="screen_name">SkollWF</data><data key="description">Accelerating entrepreneurial approaches and innovative solutions to the world’s most pressing social issues.</data><data key="followers_count">11629</data><data key="friends_count">284</data><data key="listed_count">476</data><data key="statuses_count">3200</data><data key="favourites_count">2</data><data key="location">Oxford</data><data key="time_zone">London</data><data key="utc_offset">0</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/2841813359/111076a2172dd5b4644c6412c0fbdd8e_normal.jpeg</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/ATK7UVsnyZ</data></node> <!-- Node n110 - Another bad apostrophe "s" --> <node id="n110" labels="User" ><data key="labels">User</data><data key="id_str">759251</data><data key="name">CNN</data><data key="screen_name">CNN</data><data key="description">Bringing you breaking news and the most talked about stories. Join the conversation and let’s connect!</data><data key="followers_count">11213688</data><data key="friends_count">820</data><data key="listed_count">89272</data><data key="statuses_count">37141</data><data key="favourites_count">13</data><data key="location"></data><data key="time_zone">Eastern Time (US & Canada)</data><data key="utc_offset">-18000</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/378800000049679889/9097753c470683f49aa12a6c15eba5c7_normal.jpeg</data><data key="geo_enabled">false</data><data key="verified">true</data><data key="notifications">false</data><data key="url">http://t.co/Db6JkaxJ9R</data></node> <!-- Node n201 - Several errors, the first ons is "'s" which we have seen before but the second one is "–P" both special ASCII --> <node id="n201" labels="User" ><data key="labels">User</data><data key="id_str">68911475</data><data key="name">AsianDevelopmentBank</data><data key="screen_name">ADB_HQ</data><data key="description">The Asian Development Bank's mission is to help developing Asia–Pacific nations reduce poverty and improve their people's quality of life. ADB.</data><data key="followers_count">12714</data><data key="friends_count">1325</data><data key="listed_count">376</data><data key="statuses_count">5840</data><data key="favourites_count">40</data><data key="location">Manila, Philippines</data><data key="time_zone">Hong Kong</data><data key="utc_offset">28800</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/382205459/ADB_logo_normal.JPG</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/m6dvlL7KPB</data></node> <!-- Node n563 - The quote before "Inspires" is a smart quote as is the one at the end of that quote. --> <node id="n563" labels="User" ><data key="labels">User</data><data key="id_str">46486541</data><data key="name">Go Inspire Go</data><data key="screen_name">GOInspireGO</data><data key="description">Go Inspire Go (GIG) is a video-based website that “Inspires Viewers to Discover & Use Their Power (Talents/Resources/Network) to Help Others.”</data><data key="followers_count">2508</data><data key="friends_count">1468</data><data key="listed_count">47</data><data key="statuses_count">4474</data><data key="favourites_count">115</data><data key="location">San Francisco</data><data key="time_zone">Pacific Time (US & Canada)</data><data key="utc_offset">-28800</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/378800000762652529/ed9b092a3a805d8ba6ea4e4a9f729470_normal.jpeg</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/rvlbqiZfRA</data></node> <!-- Node n593 - The em-dash? between "Fusion and the" --> <node id="n593" labels="User" ><data key="labels">User</data><data key="id_str">11204932</data><data key="name">Jorge Rivas</data><data key="screen_name">thisisjorge</data><data key="description">National Affairs Correspondent at @ThisIsFusion—the ABC-Univision joint venture.</data><data key="followers_count">3406</data><data key="friends_count">1457</data><data key="listed_count">142</data><data key="statuses_count">8577</data><data key="favourites_count">385</data><data key="location">Los Angeles</data><data key="time_zone">Pacific Time (US & Canada)</data><data key="utc_offset">-28800</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/1155380191/sideprofile_normal.jpg</data><data key="geo_enabled">true</data><data key="verified">false</data><data key="notifications">false</data></node> <!-- Node n631 - single quotes are an issue as well --> <node id="n631" labels="User" ><data key="labels">User</data><data key="id_str">204937371</data><data key="name">Left of Black</data><data key="screen_name">LeftOfBlack</data><data key="description">Prof. Mark Anthony Neal of Duke University offers a ‘contrarian view of blackness.” Neal interviews academics & artists for the show airing Mondays at 1:30pmEST</data><data key="followers_count">4674</data><data key="friends_count">1252</data><data key="listed_count">174</data><data key="statuses_count">3225</data><data key="favourites_count">144</data><data key="location">John Hope Franklin Center</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/1153334485/Left_of_Black_Promo_normal.jpg</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/NP6jQ16AvM</data></node> <!-- Node n786 - more odd characters --> <node id="n786" labels="User" ><data key="labels">User</data><data key="id_str">612159362</data><data key="name">šīrīn ✺ šəfīʿ</data><data key="screen_name">shereenTshafi</data><data key="description">20, American of Pakistani descent, studying IR + anthropology @JohnsHopkins ◆ Tweeting FP, race, feminism, the Middle East, South Asia & whatever else in btwn</data><data key="followers_count">1490</data><data key="friends_count">409</data><data key="listed_count">41</data><data key="statuses_count">50092</data><data key="favourites_count">3995</data><data key="location">Maryland // al barzakh</data><data key="time_zone">Eastern Time (US & Canada)</data><data key="utc_offset">-18000</data><data key="lang">en</data><data key="profile_image_url">http://pbs.twimg.com/profile_images/378800000782361602/75b54fae5e1cb340247aa2d41e624d35_normal.jpeg</data><data key="geo_enabled">false</data><data key="verified">false</data><data key="notifications">false</data><data key="url">http://t.co/sE0H2R6Qo4</data></node> <!-- These are just representative errors. My parser stopped here at 200 errors with thousands of nodes left to go. I tried to select ones might be useful in debugging. The & to & is by far the most common. Personally I would not try to re-write the text conversion rules because there should be libraries in most languages for that. --> </graph>